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Foreword 



ETAPS’99 is the second instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference that was 
established in 1998 by combining a number of existing and new conferences. 
This year it comprises five conferences (FOSSACS, EASE, ESOP, CC, TAG AS), 
four satellite workshops (CMOS, AS, WAGA, CoFI), seven invited lectures, two 
invited tutorials, and six contributed tutorials. 

The events that comprise ETAPS address various aspects of the system de- 
velopment process, including specification, design, implementation, analysis and 
improvement. The languages, methodologies and tools which support these ac- 
tivities are all well within its scope. Different blends of theory and practice are 
represented, with an inclination towards theory with a practical motivation on 
one hand and soundly-based practice on the other. Many of the issues involved 
in software design apply to systems in general, including hardware systems, and 
the emphasis on software is not intended to be exclusive. 

ETAPS is a loose confederation in which each event retains its own identity, 
with a separate programme committee and independent proceedings. Its format 
is open-ended, allowing it to grow and evolve as time goes by. Gontributed talks 
and system demonstrations are in synchronized parallel sessions, with invited 
lectures in plenary sessions. Two of the invited lectures are reserved for “unify- 
ing” talks on topics of interest to the whole range of ETAPS attendees. As an 
experiment, ETAPS’99 also includes two invited tutorials on topics of special 
interest. The aim of cramming all this activity into a single one- week meeting 
is to create a strong magnet for academic and industrial researchers working on 
topics within its scope, giving them the opportunity to learn about research in 
related areas, and thereby to foster new and existing links between work in areas 
that have hitherto been addressed in separate meetings. 

ETAPS’99 has been organized by Jan Bergstra of GWI and the University of 
Amsterdam together with Frans Snijders of GWI. Overall planning for ETAPS’99 
was the responsibility of the ETAPS Steering Gommittee, whose current mem- 
bership is: 

Andre Arnold (Bordeaux), Egidio Astesiano (Genoa), Jan Bergstra (Am- 
sterdam), Ed Brinksma (Enschede), Ranee Gleaveland (Stony 
Brook), Pierpaolo Degano (Pisa), Hartmut Ehrig (Berlin), Jose Fiadeiro 
(Lisbon), Jean-Pierre Finance (Nancy), Marie-Glaude Gaudel (Paris), 
Susanne Graf (Grenoble), Stefan Jahnichen (Berlin), Paul Klint (Ams- 
terdam), Kai Koskimies (Tampere), Tom Maibaum (London), Ugo 
Montanari (Pisa), Hanne Riis Nielson (Aarhus), Fernando Orejas 
(Barcelona), Don Sannella (Edinburgh), Gert Smolka (Saarbriicken), 
Doaitse Swierstra (Utrecht), Wolfgang Thomas (Aachen), Jerzy Tiuryn 
(Warsaw), David Watt (Glasgow) 
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ETAPS’98 has received generous sponsorship from: 

— KPN Research 

— Philips Research 

— The EU programme “Training and Mobility of Researchers” 

— CWI 

— The University of Amsterdam 

~ The European Association for Programming Languages and Systems 

— The European Association for Theoretical Computer Science 

I would like to express my sincere gratitude to all of these people and orga- 
nizations, the programme committee members of the ETAPS conferences, the 
organizers of the satellite events, the speakers themselves, and finally Springer- 
Verlag for agreeing to publish the ETAPS proceedings. 

Edinburgh, January 1999 Donald Sannella 

ETAPS Steering Committee Chairman 
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Preface 

This volume contains the proceedings of the fifth international meeting on Tools 
and Algorithms for the Construction and Analysis of Systems (TACAS’99). 
TACAS’99 took place on 22-25 March 1999 in Amsterdam as a constituent con- 
ference of the European Joint Conferences on Theory and Practice of Software 
(ETAPS). More information about it may be found in the foreword. Previous 
TACAS meetings occurred in 1998 (Lisbon), 1997 (Twente), 1996 (Passau), and 
1995 (Aarhus). Like TACAS’98, TACAS’99 was a conference, while the meet- 
ings before 1998 were workshops. All previous TACAS proceedings have been 
published as volumes in Springer’s Lecture Notes in Computer Science series. 

TACAS’s mission is to provide a forum for researchers, developers and users 
interested in rigorously based tools for the construction and analysis of systems. 
The conference aims to bridge the gaps between different communities — in- 
cluding but not limited to those devoted to formal methods, real-time, software 
engineering, communications protocols, hardware, theorem proving, and pro- 
gramming languages — that have traditionally had little interaction but share 
common interests in and techniques for tool development. In particular, by pro- 
viding a venue for the discussion of common problems, heuristics, algorithms, 
data structures and methodologies, TACAS hopes to support researchers in their 
quest to improve the utility, reliability, flexibility and efficiency of tools for build- 
ing systems. 

These proceedings contain an invited paper, 28 refereed contributions, a posi- 
tion statement, and the text of an ETAPS tool demonstration that was reviewed 
independently of the TACAS program committee. The 28 regular papers were 
selected from 82 submissions, which represents the largest number of submissions 
TACAS has had to date. The accepted papers cover a wide range of topics, as 
the table of contents indicates, although all have relevance to the development 
and deployment of tools. 

As Program Committee Chairman for TACAS, I would like to acknowledge 
the efforts of the Program Committee and paper reviewers. The obvious strength 
of the conference program is a testament to their thoughtful analyses of the sub- 
mitted papers and to the seriousness with which they approached the selection 
process. I would also like to thank the other members of the TACAS Steering 
Committee for their guidance and advice in organizing the conference. 

Stony Brook, January 1999 W. Ranee Cleaveland II 

Program Committee Chairman 
TACAS’99 
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Integrating Printed and Online Information 



Thomas Friese^, Tiziana Margaria^, and Thomas Rakow^ 

^ NeoMedia Technologies, Vienna (A), www.neom.com 
^ Meta Frame Technologies, Dortmund (D), www.metaframe.de 
® Springer Verlag, Heidelberg (D) , www . springer . de 

This LNCS volume concretizes a pilot project between NeoMedia Technologies, 
MetaFrame Technologies, and the Springer Verlag aimed at flexibly combining 
the strengths of printed and online documents. In a typical scenario, a combi- 
nation of technologies enables researchers and students who normally frequent 
a library for taking copies to order their own high-quality print-outs of selected 
literature by scanning bar codes identifying the relevant papers as well as an 
authorization code (e.g. from a membership card). This automatically directs 
the print-outs to the most appropriate printer, e.g., at the member’s institute 
or directly at the library. Costs are deduced from the member’s or institution’s 
account. Using light-weight bar code readers, this avoids the typical drawbacks 
of traditional copying, like physically carrying the volumes to the queue at the 
copier, lengthy operations, which also deteriorates the books, and the often bad 
quality of the copies. Technically, our solution combines 

— a Digital Object Identifier (DOI), a unique and persistent identification code 
identical for printed and electronic versions of a document, 

— an IDOCs™^ enabled bar code on the printed documents corresponding to 
the DOI, and 

— an internet service managing the overall workflow. 

The Springer Verlag is among the publishers supporting the introduction of the 
DOI global identification system for intellectual property in the digital envi- 
ronment. Additionally, in our prototype comfortable and intelligent access to 
electronic documents is realized by linking light-weight string code technology 
with the power of modern internet. Via the bar code or the numeric code below 
it, IDOCs (NeoMedia Technologies’s Intelligent Document Solutions) guides the 
interested party over internet directly to a specific, customizable online service 
offered by MetaFrame Technologies which manages the subsequent workflow. In 
particular, it is possible to direct the request to further detailed information 
on a web page (here, the online version of the paper), and to add extra func- 
tionality (printing and billing) without modifying the underlying web presence. 
This combination of technologies guarantees not only the shortest distance to 
the required information but also enables additional, flexible support. 

Common goals of the present project are to provide a framework for man- 
aging intellectual content, link customers with publishers, facilitate electronic 
commerce, and enable automated copyright management. The technology is in 
fact easily applicable to many more scenarios where the currently lacking con- 
sequent coherence between printed and online information would dramatically 
simplify the workflow for library users, librarians and publishers. 

^ IDOCs is a registered trademark of NeoMedia Technologies. 
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Abstract. In the past, attempts to convince practising software engineers to 
adopt formal methods of software development were generally unsuccessful. 
The methods were too difficult to learn and use, provided inadequate tool 
support and did not integrate well into the software development process. In 
short, they could only be used effectively by the gods who created them! Are 
we in a better position today? Recent advances in and experience with 
specification techniques and automated model checking have demonstrated the 
utility of these techniques. In this paper we outline one such effort which is 
specifically intended to facilitate modelling as part of the software process, and 
to try to make model specification and model checking accessible to mere 
mortals. 



1 Introduction 

The ACM 50* Anniversary edition of Computing Surveys contains two excellent 
papers describing the state of the art and research directions in Formal Methods [1] 
and Concurrency [2]. The former paper describes the failure of past formal methods 
to make a real impact on practising software engineers, indicating that they were just 
too difficult to learn and use, provided inadequate tool support and did not integrate 
well into the software development process. The paper describes how recent advances 
in fundamental research and improved technology has enabled a number of 
specification and verification techniques to be used in practice on major industrial 
case studies. However, if we are to make a significant impact on practice, we must 
still seek to make our methods and tools more attractive and accessible to practioners. 
In addition, we need to provide the associated education and technology transfer. 

The latter paper [2] provides an excellent overview of the issues and current research 
in the field of concurrency. Again, the need for education, technology transfer and 
improved integration with the software lifecycle is identified. Furthermore, the paper 
states: ’’Traditionally, software engineering devotes much attention to organizational 
and procedural issues in software development and relatively little to methods for 
system analysis; in this respect it resembles a management discipline rather than an 
engineering one. Tools based on concurrency theory offer a particularly appropriate 
starting point for putting the engineering into software engineering.” 



W.R. Cleaveland (Ed.): TACAS/ETAPS'99, LNCS 1579, pp. 1-18, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 
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One of the main aims of our research is to provide sound and accessible techniques 
for modelling and model verification associated with the development of concurrent 
and distributed systems. We are acutely aware of the problems of advocating 
powerful but erudite approaches usable by only the developers themselves. We have 
therefore sought to adopt and adapt concepts and techniques which offer the best hope 
of widescale use by ordinary, competent engineers. We recognise that no one method 
or tool will suffice for all purposes, and believe that it is better to provide a well- 
focussed usable approach than one which is more general but less usable. Hence we 
focus our work on the modelling of only a particular aspect of a system - concurrent 
behaviour. 

Integration with the software development process 

Our approach exploits the software architecture as the underlying design structure of a 
system, common to the various phases of system development. In particular, we use 
the Darwin architecture description language [3,4] which has been designed to be 
sufficiently abstract to support multiple views (Figure 1). Each view is an elaboration 
of the basic structural view: the skeleton upon which we hang the flesh of behaviour 
specification or service implementation [5]. The service view describes the system as 
a hierarchical composition of components, each of which provides and requires 
services at its interface, with implementation elaborations for the primitive 
components. The behavioural view models the system as a hierarchical, parallel 
composition of component processes, each of which interacts with other processes via 
shared actions at its interface, with behaviour elaborations for the primitive 
component processes. In essence, the architecture drives the process of putting 
together individual component specifications or implementations in order to obtain a 
system with desirable characteristics. When performing analysis, these characteristics 
are formally described in terms of properties against which the specified system is 
checked. 



r 

Behavioural View 




Analysis 



Structural View 





Service View 




Construction/ 

implementation 



Fig. 1. Common Structural View, with Service and Behavioural Views 
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Specification and Analysis 

State machines are a popular modelling technique which is widely taught and used. 
For this reason, we were attracted to the use of Labelled Transition systems (LTS) as 
the underlying formalism for our work. For the verification of finite models, model 
checking offers a fast, automatic technique with the benefit of providing 
counterexamples as feedback when property violations are detected. This satisfies our 
belief that automated tools are essential to aid verification. Furthermore, LTS 
supports the appropriate compositionality (using Compositional Reachability 
Analysis CRA) with the components specified simply as finite state processes (FSP) 
[6]. Compositional methods are desirable in the way that they can reflect the 
structure of the system. In addition, we have techniques for ameliorating the problem 
of exponential state explosion in some circumstances [7] and for analysing for both 
safety [8] and liveness [9] properties. Our property checking mechanisms have been 
specifically designed for our models that focus on actions rather than states and also 
address issues related to CRA techniques. As liveness property checks can be 
expensive, we have also identified a subclass of such properties that occur frequently 
in practice, and which can be checked directly on the graph of the system, without the 
use of Biichi automata [10]. This class has been named progress. Finally, our methods 
also support action priority, which allows users to concentrate on specific parts of 
system behaviour, to impose adverse conditions, or perform a partial search when an 
exhaustive search cannot be achieved. 

All this is supported by the LTS Analyser (LTSA) which provides for automatic 
composition, analysis, minimisation, animation and graphical display [5,11,12]. 

Paper outline 

Rather than describe the underlying theory or details of the analysis techniques, we 
use an example to illustrate our general approach and tool support. In particular, we 
use a Supervisor-Worker/Tuple Space example of a concurrent architecture to 
illustrate the use of FSP/LTS and the LTSA tools for specification and reasoning. For 
the sake of brevity, we make very little attempt to compare our work with that of 
others. The Formal Methods and Concurrency papers [1,2] provide an excellent 
survey of the field and of related work. Instead we merely try to indicate the reasons 
for some of our choices. We introduce only as much of the notation and analysis 
techniques as necessary for the example. Full details can be found in [13] and at the 
web site: http://www-dse.doc.ic.ac.uk/concurrency/. 
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2. Architectural Design and Analysis: an Example 



2.1 Supervisor-Worker Description 

Supervisor-Worker is a concurrent architecture that can be used to speed up the 
execution of some computational problems by exploiting parallel execution on 
multiple processors. The architecture applies when a computational problem can be 
split up into a number of independent sub-problems. These independent sub-problems 
are referred to as tasks in the following. The process architecture of a Supervisor- 
Worker program is depicted in Figure 2. 




Supervisor-Worker 



Fig. 2. Supervisor-Worker process architecture 

Supervisor and worker processes interact by a connector that we refer to, for the 
moment, as a “bag”. The supervisor process is responsible for generating an initial set 
of tasks and placing them in the bag. Additionally, the supervisor collects results from 
the bag and determines when the computation has finished. Each worker repetitively 
takes a task from the bag, computes the result for that task, and places the result in the 
bag. This process is repeated until the supervisor signals that the computation has 
finished. We can use any number of worker processes in the Supervisor- Worker 
architecture. First, we examine an interaction mechanism suitable for implementing 
the bag connector. 
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2.2 Linda Tuple Space 

Linda is the collective name given by Carriero and Gelernter [14] for a set of 
primitive operations used to access a data structure called a tuple space. A tuple space 
is a shared associative memory consisting of a collection of tagged data records called 
tuples. Each data tuple in a tuple space has the form: 

("tag", value^,..., value^ ) 

The tag is a literal string used to distinguish between tuples representing different 
classes of data, value^ are zero or more data values - integers, floats and so on. 

There are three basic Linda operations for manipulating data tuples: out, in and rd. A 
process deposits a tuple in a tuple space using: 

out ("tag", expr^,..., expr^ ) 

Execution of out completes when the expressions have been evaluated and the 
resulting tuple has been deposited in the tuple space. A process removes a tuple from 
tuple space by executing: 

in ("tag", field^,..., fields ) 

Each fieldds either an expression or a formal parameter of the form Ivar where var is 
a local variable in the executing process. The arguments to in are called a template; 
the process executing in blocks until the tuple space contains a tuple that matches the 
template and then removes it. A template matches a data tuple in the following 
circumstances: if the tags are identical, the template and tuple have the same number 
of fields, the expressions in the template are equal to the corresponding values in the 
tuple, and the variables in the template have the same type as the corresponding 
values in the tuple. When the matching tuple is removed from the tuple space, the 
formal parameters in the template are assigned the corresponding values from the 
tuple. 

The third basic operation is rd, which functions in exactly the same way as in except 
that the tuple matching the template is not removed from the tuple space. The 
operation is used to examine the contents of a tuple space without modifying it. Linda 
also provides non-blocking versions of in and rd called inp and rdp which return true 
if a matching tuple is found and return false otherwise. Linda has a sixth operation 
called eval that creates an active or process tuple. It is not used in the following 
example. 

FSP Notation 

The behavioural specification for the tuple space involves describing it in the ESP 
process algebra-based notation [5,11,12,13]. This notation is used as a concise way of 
describing the Labelled Transition System (LTS) of the tuple space for analysis 
purposes. It is an “ASCII” notation to simplify parsing by the analysis tools. The 
original intention was to provide a graphical tool for drawing LTS diagrams. 
However, it soon became clear that this was clumsy and inappropriate for all but the 
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simplest of models. Hence we rather provide a means for translating and displaying 
FSP specifications as LTS diagrams as feedback to the designers. 

Primitive components are defined as finite state processes in FSP using action prefix 
choice "I ", and recursion. If x is an action and P a process then (x->P) 
describes a process that initially engages in the action x and then behaves exactly as 
described by P. If x and y are actions then (x->P|y->Q) describes a process 
which initially engages in either of the actions x or y, and the subsequent behaviour 
is described by P or Q, respectively. Guards can be used to control the choice of 
action. 

Processes can be composed using the parallel composition operator “ | | ”. Processes 
interact by synchronising on their shared actions, with interleaving of all other 
actions. We have adopted the broadcast semantics of CSP [15] for interaction as this 
facilitates the composition of property automata to check that the composed system 
satisfies desired properties. 

Tuple Space Model 

Our modeling approach requires that we construct finite state models. Consequently, 
we must model a tuple space with a finite set of tuple values. In addition, since a tuple 
space can contain more than one tuple with the same value, we must fix the number 
of copies of each value that are allowed. We define this number to be the constant N 
and the allowed values to be the set Tuples. 

const N = ... 

set Tuples = {...} 

The precise definition of N and Tuples depends on the context in which we use the 
tuple space model. Each tuple value is modelled by an FSP label of the form 
tag.val^...val^. We define a process to manage each tuple value and the tuple space is 
then modelled by the parallel composition of these processes: 

const False = 0 
const True = 1 
range Bool = False.. True 

TUPLE (T= ' any) = TUPLE [0], 

TUPLE [i : 0 . -N] 

= (out[T] 

|when (i>0) in[T] 
jwhen (i>0) inp [True] [T] 

I when (i = = 0) inp [False] [T] 
jwhen (i>0) rd[T] 

I rdp [i>0] [T] 

) . 



-> TUPLE [i+1] 
-> TUPLE [i-1] 
-> TUPLE [i-1] 
-> TUPLE [i] 

-> TUPLE [i] 

-> TUPLE [i] 



IItUPLESPACE = forall [t:Tuples] TUPLE(t). 
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The LTS for TUPLE value any with N=2 is depicted in figure 3. Note that we have 
not specified a guard for the out action. Hence, exceeding the capacity by 
performing more than two out operations leads to an ERROR. This is indicated by 
the ERROR state -1 in figure 3. This is a form of trap state [8] which, if reachable, 
indicates that that error is possible in the system. To aid the specification process, the 
LTSA compiler automatically maps such undefined transitions to the ERROR state. 
As shown later, the ERROR state is also used in property automata to check for the 
violation of safety properties. 

A tuple space is then defined as the parallel composition of tuples, for all types of 
tuple in the Tuples set. 



out. any out. any 




Fig. 3. TUPLE LTS 



An example of a conditional operation on the tuple space would be; 
inp[b:Bool] [t: Tuples] 

The value of the local variable t is only valid when b is true. Each TUPLE process 
has in its alphabet the operations on one specific tuple value. The alphabet of 
TUPLESPACE is defined by the set TupleAlpha: 

set TupleAlpha 

= { { in, out , rd, rdp [Bool] , inp [Bool] } . Tuples } 

A process that shares access to the tuple space must include all the actions of this set 
in its alphabet. 
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Animation 

It is often the case that the LTS of a specified process is too complex to rely merely 
on inspection to convince oneself that it models the required behaviour. Animation 
can be used to test a specification. For instance, for the single tuple defined by the set, 

set Tuples = {any} 

LTSA permits a designer to step through the scenario given in figure 4. The actions 
eligible at any time are indicated by a tick, and the trace of actions is given on the left. 
However, for exhaustive property checking, we use property automata. 




Fig. 4. Animation of the TUPLESPACE for tuple any 



Property Automata 

Checks can be made that the model satisfies certain safety properties by specifying 
these properties as automata and composing them with the system. For example, the 
following property asserts that an in action must always have been preceded by a 
matching out action. 

property CHECK (T= ' any) = CHECK[0], 

CHECK [i : 0 . .N] 

= (when (i<N) out [T] -> CHECK[i + l] 

|when (i>0) in [T] -> CHECK[i-l] 

) . 



This generates the image automata with the LTS shown in figure 5. As illustrated, 
property automata are automatically made complete by replacing any undefined 
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transition with a transition to the ERROR state. In the final system, safety property 
violations are identified by the reachability of the ERROR state. 



out. any out. any 




out.any 



Fig. 5. Property CHECK 

This can be composed with TUPLES PACE as follows using the parallel composition 
operator. 

I I TUPLES PACE = 

forall [t: Tuples] (TUPLE (t) | | CHECK (t) ) . 

The LTSA analysis tool detects the following violation of property CHECK for tuples 
any, as well as the violation specified in the TUPLE itself. 

Composing 

property CHECK (any) violation, 
property TUPLE (any) violation. 

States Composed: 6 Transitions: 24 in 0ms 
Trace to property violation in TUPLE (any) : 
out . any 
out . any 
out . any 

Hence, as expected, the violation in which an in action is executed before an out 
action is not permitted by the TUPLE SPACE. 



2.3 Supervisor-Worker Model 

We model a simple supervisor-worker system in which the supervisor initially outputs 
a set of tasks to the tuple space and then collects results. Each worker repetitively gets 
a task and computes the result. The algorithms for the supervisor and each worker 
process are sketched below: 

Supervisor:: 

forall tasks:- out(“task”,...) 
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forall results: in(“result”,...) 
out(“stop”) 



Worker.'. 

while not rdp(“stop”) do 
in(“task”,...) 
compute result 
out(“result”,...) 

To terminate the program, the supervisor outputs a tuple with the tag “stop” when it 
has collected all the results it requires. Workers run until they read this tuple. The set 
of tuple values and the maximum number of copies of each value are defined for the 
model as: 

const N =2 

set Tuples = { task, result , stop} 

The supervisor outputs N tasks to the tuple space, collects N results and then outputs 
the “stop” tuple and terminates. 

SUPERVISOR = TASK[1], 

TASK[i:l. .N] = 

(out. task -> 

if i<N then TASK[i+l] else RESULT[1]), 

RESULT [i : 1 . .N] = 

(in. result -> 

if i<N then RESULT [i+1] else FINISH), 

FINISH = 

(out. Stop -> end -> STOP) + TupleAlpha. 

As illustrated, FSP supports the definition of conditional processes using if then 
else . The STOP process is one which engages in no further actions. For ease of use, 
the alphabet for a process is defined implicitly by the actions in its definition. This is 
generally more convenient than explicit definition. However, in order to ensure that 
no free actions can occur in the tuple space, we use “+” to explicitly extend the 
alphabet of the supervisor to include all the actions in the shared tuple space. 

The worker checks for the “stop” tuple before getting a task and outputting the result. 
The worker terminates when it reads “stop” successfully. 

WORKER = 

(rdp[b:Bool] .stop-> 

if ( ! b) then 

(in. task -> out. result -> WORKER) 

else 

(end -> STOP) 

) + TupleAlpha . 
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The LTS for both SUPERVISOR and WORKER with N= 2 is depicted in figure 6. 



SUPERVISOR 



out. task out. task in. result in. result out. stop end 



rdp.I.stop 




Fig. 6. SUPERVISOR and WORKER LTS 

In order to avoid detecting a deadlock in the case of correct termination, we provide a 
process that can still engage in actions after the end action has occurred. We define 
an END process that engages in the action ended after the correct termination action 
end occurs. 

END = (end ->ENDED), 

ENDED = (ended- >ENDED) . 

A supervisor worker model with two workers called redWork and blueWork, 
which conforms to the architecture of Figure 2, can now be defined as follows: 

I I SUPERVISOR_WORKER 

= ( supervisor : SUPERVISOR 

I I { redWork, blueWork }: WORKER 

I I { supervisor , redWork, blueWork} :: TUPLESPACE 
I I END 

) / (end/ (supervisor, redWork, blueWork} .end} . 

We use “ : ” to define a named process instance. The effect is to prefix each label in 
the alphabet of the process by the instance name eg. supervisor. This also 
supports the definition of multiple named process instances eg. workers redWork, 
and blueWork. For shared resources, such as the tuple space, every transition needs 
to be capable of being shared with any of the supervisor or worker processes. The 
notation “ : : ” indicates that every action in the tuple space becomes a choice, 
prefixed by each of the user processes: supervisor, redWork, and blueWork. 
Finally, relabelling “/” is used to ensure that all processes engage in the same end 
action together. 
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Analysis 

Safety analysis of this model using LTSA reveals no ERROR violations. However, it 
does detect the following potential deadlock: 



Trace to DEADLOCK: 

supervisor . out . task 
supervisor . out . task 
redWork . rdp . 0 . stop 
redWork . in . task 
redWork . out . result 
supervisor . in . result 
redWork . rdp . 0 . stop 
redWork . in . task 
redWork . out . result 
supervisor . in . result 
redWork . rdp . 0 . stop 
supervisor . out . stop 
blueWork . rdp . 1 . stop 



- - rdp returns false 



- - rdp returns false 



- - rdp returns false 

- - rdp returns true 



This trace is for an execution in which the red worker computes the results for the two 
tasks put into tuple space by the supervisor. This is quite legitimate behaviour for a 
real system since workers can run at different speeds and take different amounts of 
time to start. The deadlock occurs because the supervisor only outputs the “stop” 
tuple after the red worker attempts to read it. When the red worker tries to read, the 
“stop” tuple has not yet been put into the tuple space, and consequently, the worker 
does not terminate but blocks waiting for another task. Since the supervisor has 
finished, no more tuples will be put into the tuple space and consequently, the worker 
will never terminate. 



This deadlock, which can be repeated for different numbers of tasks and workers, 
indicates that the termination scheme we have adopted is incorrect. Although the 
supervisor completes the computation, workers may not terminate. It relies on a 
worker being able to input tuples until it reads the “stop” tuple. As the model 
demonstrates, this may not happen. This would be a difficult error to observe in an 
implementation since the program would produce the correct computational result. 
However, after an execution, worker processes would be blocked and consequently 
retain execution resources such as memory and system resources such as control 
blocks. Only after a number of executions might the user observe a system crash due 
to many hung processes. Nevertheless, this technique of using a “stop” tuple appears 
in an example Linda program in a standard textbook on concurrent programming ! 

A possible solution is for the supervisor to output a “task” tuple with a special stop 
value. When a worker inputs this value, it outputs it again and then terminates. 
Because a worker outputs the stop task before terminating, each worker will 
eventually input it and terminate. This termination technique appears in algorithms 
published by the designers of Linda [16]. The revised algorithms for supervisor and 
worker are sketched below: 
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Supervisor.: 

forall tasks:- out(“task”,. . .) 
forall results:- in(“result”,...) 
out(“task”,5top) 



Worker:: 

while true do 

in(“task”,...) 

if value is stop then out(“task”,ito/7); exit 

compute result 

out(“result”,...) 

The tuple definitions and models for supervisor and worker now becomes: 

set Tuples = { task, task . stop, result } 

SUPERVISOR = TASK[1], 

TASK[i:l. .N] = 

(out. task -> 

if i<N then TASK[i+l] else RESULT[1]), 

RESULT [i : 1 . .N] = 

(in. result -> 

if i<N then RESULT [i+1] else FINISH), 

FINISH = 

(out . task . stop -> end -> STOP) 

+ TupleAlpha. 

WORKER = 

(in. task -> out. result -> WORKER 
I in. task. stop -> out . task . stop -> end ->STOP 
) + TupleAlpha. 

The revised model does not deadlock and does not violate any safety property. 

Progress 

We have found a check for a kind of liveness properties which we term progress to 
provide sufficient information on liveness in many examples[10]. Progress asserts 
that in any infinite execution of the system being modelled, all actions can occur 
infinitely often. In performing the progress check, we assume strongly fair choice, 
according to which if a choice is executed infinitely often, all transitions enabled are 
selected infinitely often. For instance, we can use the following progress property to 
check that our supervisor-worker model does indeed progress to action ended. 

progress END = {ended} 

LTSA reports: No progress violations detected. 
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supervisor.out.task 

supervisor.out.task 

redWork.in.task 



-r redWork.in.task 
r blueWork.in.task 



redWork.out.resun 

supervisor.in.result 

redWork.in.task 

redWork.oiJt.result 

supervisor.in.result 

supervisor.out.task.stop 

redWork.in.task.stop 

redWork.out.task.stop 

blueWork.in.task.stop 

blueWork.out.task.stop 

end 

ended 

ended! 



On the other hand, we can ask if, say, the workers can always accept a tuple from the 
tuple space, i.e. 

progress TASK = { {redWork,blueWork} . in} 

As expected, we then get the following violation: 

Progress violation: TASK 
Trace to terminal set of states: 
supervisor . out . task 
supervisor . out . task 
redWork . in . task 
redWork . out . result 
supervisor . in . result 
redWork . in . task 
redWork . out . result 
supervisor . in . result 
supervisor . out . task . stop 
redWork . in . task . stop 
redWork . out . task . stop 
blueWork . in . task . stop 
blueWork . out . task . stop 
end 

Actions in terminal set: 

(ended) 





Fig. 7. Trace of Supervisor-Worker model 
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This indicates that after receiving the task . stop tuple, the workers can no longer 
accept tuples, and that only the action ended is available in the terminal set. The 
sample trace again has the red worker computing both tasks. This trace can also be 
generated by animation as shown in Figure 7. 

Minimisation 

There is also a hiding operator @ which captures the notion of external interfaces of 
components, and is used in the specification of both primitive and composite 
components. Operator @ specifies the set of action labels (alphabet) which are visible 
at the interface of the component and thus may be shared with other components. It 
restricts the alphabet of the LTS to the actions prefixed by these labels. All other 
actions are “hidden” and will appear as “silent” or “x” actions during analysis if they 
do not disappear during minimisation (minimisation is performed with respect to 
observational equivalence as defined by Milner [17]). 

For instance, we can abstract from many of the actions of the supervisor-worker 
model to examine only those in actions of the workers. This can be specifed as 
follows: 

I I MINIMISE = SUPERVISOR_WORKER 

@{ { redWork, blueWork} . in . task, 
end 

}■ 

Minimisation produces a system with only seven states. The LTS is shown in figure 
8. This clearly illustrates the actions of the workers in accepting tasks and finally in 
dealing with the stop task. 

Composing 

States Composed: 27 Transitions: 41 in 0ms 
MINIMIZE minimising. . . . 

Minimised States: 7 in 60ms 



redWork.in.task.stop 




Fig. 8. Minimised LTS for the Supervisor-Worker model with hiding 
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3. Conclusions 

In this paper, we have modelled the Supervisor- Worker architecture without reference 
to a specific application. We were able to discover a problem with termination and to 
provide a general solution that can be used in any application implemented within the 
framework of the architecture. Thus modelling has been used as an integral part of the 
design process, at a fairly high level of design abstraction. 

Detailed aspects of a system can also be modelled and analysed. However, an issue 
that always arises when considering exhaustive state space search methods is 
scalability. We have used the current toolset, which has not yet been optimised for 
performance, to analyse an Active Badge System[ll] in which the final model has 
566,820 reachable states and 2,428,488 possible transitions. This took 400 seconds to 
construct and check on a 200MHz Pentium Pro and required 170Mb of store. The 
effect of using compositional reachability to reduce the state space in this example 
can be seen from the table below: 



(Badge, 

Location) 


Reachabie 

States 


T ransitions 


Memory 

(Mb) 


Times 

(secs) 


(2,3) 


12,213 


52,758 


3 


6 


m inim ised 


3,924 


19,260 


<1 


12 


(2,4) 


58,384 


252,576 


13 


38 


m inim ised 


13,776 


69,616 


4 


74 


(2,5) 


202,275 


871,350 


52 


180 


m inim ised 


39,600 


201,650 


22 


337 


(2,6) 


566,820 


2 , 428,488 


173 


400 


m inim ised 


98,316 


498,600 


68 


1,273 



We believe that analysis and design are closely inter-linked activities which should 
proceed hand in hand. The notation FSP and its associated analysis tool LTSA have 
been carefully engineered to facilitate an incremental and interactive approach to the 
development of component-based systems. Analysis and animation can be carried out 
at any level of the architecture. Consequently, component models can be designed 
and debugged before composing them into larger systems. The analysis results are 
easily related to the architectural model of interconnected components. The LTSA 
analysis tool described in this paper is written in Java'’’’^ and can be run as an 
application or applet. It is available at http://www-dse.doc.ic.ac.uk/concurrency . 

The approach we have described in this paper to analyzing software architectures is a 
general one, which is not restricted to a particular tool-set. For example, CSP/FDR 
[15,18] has been used with the architectural description language Wright [19] and 
both LOTOS/CADP [20] and Promela/SPIN [21] have been used in the context of 



Modelling for mere Mortals 



17 



analysing software architectures. Our approach is distinguished by the direct use of 
the architecture description in analysis, the use of compositional reachability as a way 
of managing the state space and, hopefully, by the ease of use of the toolset. 

Finally, we have experience of teaching the approach to a variety of students [13]: 
undergraduate students in the second year of three and four year Computing, Software 
Engineering and combined Computing/Electrical Engineering degree courses; and 
graduate students taking conversion courses in Computing. We are also investigating 
the utility of the approach in industry. These efforts constitute our contribution to 
education and technology transfer, and help to confirm our belief that the approach 
and toolset can be learnt and used to good effect within the lifetime of mere mortals. 
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Abstract. A formal framework is proposed for the verification of complex real- 
time systems, modeled as client-server scheduling systems, using the popular 
model-checking approach. Model-checking is often restricted by the large state- 
space of complex real-time systems. The scheduling of tasks in such systems can 
be taken advantage of for model-checking. Our implementation and experiments 
corroborate the feasibility of such an approach. Wide-applicability, significant 
state-space reduction, and several scheduling semantics are some of the important 
features in our theory and implementation. 



1 Introduction 

Model-checking has the promise of a formal, full, and automatic verihcation of complex 
industrial implementations in the future. In spite of the recent success in the formal 
verification of real-time systems, it is still quite infeasible to formally verify large- 
scale real-world systems due to their high degree of complexity. On the other hand, 
engineers have developed various paradigms to help build and verify safer systems. 
One such paradigm is the scheduling paradigm which greatly simplifies the interaction 
among many processes to periodical and aperiodical computation time contention. But 
still, scheduling paradigm represents a too much simplihed paradigm for many complex 
systems, such as protocol design, client-server systems, communication systems, .... 
In this paper, we construct a theoretical framework which combines the advantages of 
model-checking and scheduling paradigm with several concurrent scheduling servers 
employing different scheduling policies. Our implementation and experiments show its 
beneht and feasibility by comparing with a naive verification effort, that is, pure model- 
checking approach. Experiment data shows that exponential reduction in state-space 
size can be reached. 

In our framework, a scheduling client-server system consists of a set of servers, with 
scheduling policies specihed, and a set of scheduling client automata which are basi- 
cally real-time automata extended with scheduling tasks specihed at different modes. 
One major issue in such systems is the difficulty of compromising between two time 
scales : the job -computation time unit Aj and the schedulability-check time unit Z\s- 
Usually Aj is several orders of magnitude larger than Z\s. In real-time systems model- 
checking, very often the time and space complexities are proportional to the timing con- 
stants used in the system description. With such a big disparity between Aj and As., the 
complexity of scheduling system model-checking can easily grow beyond manageable. 
In this work, we adopted the following technique. The systems will still be presented 
with time unit Aj. But, when we are in a mode to check the schedulability, we shall 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 19-33, 1999. 
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derive formulas, with respect to different scheduling policies, to calculate the computa- 
tion time 7s for the schedulability-check. Then the duration of the schedulability-check 
is set to be [ in the time unit of Aj. With this technique, we can circumvent the po- 
tential combinatorial complexities caused by the disparity between the two time-scales. 

Another major issue is: when exactly should the checking for schedulability of the 
tasks in a mode be performed. Two alternatives arise here, namely, (1) checking before 
an in-coming transition of the mode is taken, or (2) checking after an in-coming transi- 
tion of the mode is taken. Several different kinds of semantics related to schedulability 
checking are possible. These are discussed in subsection 3.1. Following is an example 
of a video system. 

Example 1. : Video System 

Here, we have two servers and two clients in a Video-on-Demand system illustrated in 
Fig. 1. The two clients issue task service requests to both the servers concurrently. The 
two servers check if requests are schedulable and then either acknowledge or reject the 
requests. The server for movies schedules with the rate-monotonic (RM) scheduling 
policy while the other does with the earliest deadline first (EDF) scheduling policy. 
The explanation of some popular scheduling policies can be found in section 2. 

The Movie Server stores a set of movie files ready for access by clients under the 
rate-monotonic scheduling policy. The Commercials Server stores a set of commercial 
files and work with the earliest-deadline first scheduling policy. As shown in Fig. 1, the 
clients are modeled by finite-state automata that are enhanced with clocks and schedul- 
ing tasks. In the hgure, boxes represent different operational modes of the clients and 
the arrows represent transitions between modes, x and y are the two clocks used to 
control the operation times in the client automata. For example, the assignment a: := 0 
beside an arrow means that the clock x is reset to zero during the transition. The predi- 
cate a; = 35 on an out-going transition in Client A means that the transmission of movie 
“Pretty Woman,” should end at 35 time units. 

Within each box, we specify tasks by a tuple (a, c, p, d, f) where a is the server 
identification, c is the computation time of the task within each period, p is the period 
for the task, d is the deadline for each instance of a task, and / specifies if fixed priority 
(/ = 1) or dynamic priority (/ = 0) is to be used. It is important that at any instant of 
the computation, the tasks set admitted to each server remains schedulable. || 

The outline of this paper is as follows. Section 2 gives a brief survey of the priority 
scheduling policies used in our system. Section 3 presents the formal system model 
and describes how model-checking is used to verify the system. Section 4 describes 
our implementation of the model-checking approach using the popular HyTech tool 
and shows the benefit of our approach using some application examples. Section 5 
concludes the paper. 

In the following, we use M and to denote the set of non-negative integers and 
the set of non-negative real numbers. 

2 Review of scheduling research 

A real-time system generally needs to process various concurrent tasks. A task is a 
finite sequence of computation steps that collectively perform some required action 
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Movie Server 


S2 


Commercial Server 


Rate-Monotonic 




Earliest Deadline 
First 





Fig. 1. A video-on-demand system 



of a real-time system and may be characterized by its execution time, deadline, etc. 
Periodic tasks are tasks that are repeatedly executed once per period of time. Each 
execution instance of a periodic task is called a job of that task. In a processor-controlled 
system, when a processor is shared between time-critical tasks and non-time-critical 
ones, efficient use of the processor can only be achieved by careful scheduling of the 
tasks. Here, time-critical tasks are assumed to be preemptive, independent, periodic, 
and having constant execution times with hard, critical deadlines. Scheduling may be 
time-driven or priority-driven. A time-driven scheduling algorithm determines the exact 
execution time of all tasks. A priority-driven scheduling algorithm assigns priorities to 
tasks that determines which task is to be executed at a particular moment. We mainly 
consider time-critical periodic tasks with the above assumptions and scheduled using 
priority-driven scheduling algorithms. 

Depending on the type of priority assignments, there are three classes of scheduling 
algorithms: priority, dynamic priority, and mixed priority scheduling algorithms. 

When the priorities assigned to tasks are fixed and do not change between job exe- 
cutions, the algorithm is called fixed priority scheduling algorithm. When priorities 
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change dynamically between job executions, it is called dynamic priority scheduling. 
When a subset of tasks is scheduled using fixed priority assignment and the rest using 
dynamic priority assignment, it is called mixed priority scheduling. 

Before going into the details of scheduling algorithms, we define the task set to be 
scheduled as a set of n tasks ^ 2 , • • • > </>«} with computation times ci, C 2 , . • . , c„, 
request periods pi,p 2 , • ■ • :Pn, and phasings hi, h- 2 , . • . , h„. A task (j)i is to be periodi- 
cally executed for Ci time units once every pi time units. The first job of task (j)i starts 
execution at a time hi. The worst-case phasing called a critical instant occurs when all 
hi = 0, for alH, 1 < z < n. 

Liu and Layland [LL73] proposed an optimal fixed priority scheduling algorithm 
called the rate-monotonic (RM) scheduling algorithm and an optimal dynamic priority 
scheduling algorithm called earliest-deadline first (EDF) scheduling. The RM schedul- 
ing algorithm assigns higher priorities to tasks with higher request rates, that is, smaller 
request periods. Liu and Layland proved that the worst case utilization bound of RM 
was rz(2^/” — 1) for a set of n tasks. This bound decreases monotonically from 0.83 
when rz = 2 to loge2 = 0.693 as n — > oo. This result shows that any periodic task set 
of any size will be able to meet all deadlines all of the time if RM scheduling algorithm 
is used and the total utilization is not greater than 0.693. 

The exact characterization for RM was given by Lehoczky, Sha, and Ding [LSD89], 
they proved that given periodic tasks (j>i,(j> 2 , ■ ■ ■ ,<pn with request periods pi < P 2 < 
... < Pn computation requirements ci , C 2 , . • . , c„, and phasings hi,h 2 , ■ ■ ■ , fii is 
schedulable using RM iff 

(1) 

where Wfit) = TVPil > the cumulative demands on the processor by tasks over 

[0, t], 0 is a critical instant (i.e., hi = 0 for all z), and Gi = {k ■ pj \ j = 1, . . . ,i, k = 
1,..., [Pi/Pj\}. 

Liu and Layland discussed the case when task deadlines coincide with request peri- 
ods, whereas Lehoczky [L90] considered the fixed priority scheduling of periodic tasks 
with arbitrary deadlines and gave a feasibility characterization of RM in this case: 
given a task set with arbitrary deadlines di < c ?2 < ■■■< dn, fii is RM schedulable 
iff MaXfc<AT, VLi(fc, {k - l)p^ -h d^) < 1 where Wfik, x) = cj \t/pj'] 4- 

kci)/t) and Ni = min{fc | Wfik, kpi) < 1}. 

The worst case utilization bound of RM with arbitrary deadlines was also derived 
in [L90]. This bound (Uoo) depends on the common deadline postponement factor A, 
i.e., di = Api, 1 <i <n. 

U^{A) = A\og^(^^^^,A=l,2,... (2) 

For A = 2, the worst case utilization increases from 0.693 to 0.81 1 and for Z\ = 3 
it is 0.863. 

Recently, the timing analysis for a more general hard real-time periodic task set on 
a uniprocessor using fixed-priority methods was proposed by Harbour et al [HKL94]. 
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Considering the earliest deadline first dynamic priority scheduling, Liu and Layland 
[LL73] proved that given a task set, it is EOF schedulahle iff 



y — < 1 



(3) 



and showed that the processor utilization can he as high as 100%. 

Liu and Layland also discussed the case of Mixed Priority (MP) scheduling, where 
given a task set <j>i,(j> 2 , ■ ■ ■ ,4>n, the first k tasks <j>i, . . . ,(j)k, k < n, are scheduled 
using fixed priority assignments and the rest n — k tasks 4>k+i, ■ ■ ■ ,4>n are scheduled 
using dynamic priority assignments. It was shown that considering the accumulated 
processor time from 0 to f available to the task set (ofc (t ) ) , the task set is mixed priority 
schedulahle iff 



E 



t 

_Pk-\-i _ 






(4) 



for all t which are multiples of pk+i or ... or p„. Here, ak{t) can be computed as 
follows. 



k 

0-k{t) = t ~ ^ ' Cj 
i=l 



t 

Pj 



Although the EDF dynamic priority scheduling has a high processor utilization, in 
recent years fixed priority scheduling has received great interests from both academy 
and industry [LSD89, L90, SG90, HKL91, SKG91, TBW92, KAS93, HKL94]. 

Summarizing the above scheduling algorithms, we have five different cases of schedu- 
lability considerations: 

• RM-safe: all task sets are schedulahle as long as the server utilization is below 
loge2 = 0.693, 

• RM-exact: all task sets satisfying Equation (1) are schedulahle, 

• RM-arbitrary: all task sets are schedulahle as long as the server utilization is below 
Z\logg((Z\ + l)/A) (Equation (2)), 

• EDF: all task sets satisfying Equation (3) are schedulahle, 

• MP: all task sets satisfying Equation (4) are schedulahle. 



3 Client-Server Scheduling System Model 

Modeling a real-time system as a client-server scheduling system, our target system of 
verification consists of a constant number m of servers that perform scheduling and a 
constant number n of clients that issue scheduling requests. A server adopts a schedul- 
ing policy. Each client is modeled with a client automaton such that the client issues 
different scheduling requests in various modes. On receiving a request for scheduling a 
set of tasks, a server decides whether the tasks are currently schedulahle or not. 

Definition 1. .■ A Periodic Task 

A periodic task is a tuple (j> = {a,c,p, d, /), where a is the identification of the server 
on which the task is to be processed, c is the constant computation time of a job, p is the 
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request period of a job, d is the deadline within which a job must be completed before 
the next job request occurs, and / specifies if the task must be scheduled using fixed 
priority or dynamic priority, that is, / = 1 for fixed priority and / = 0 for dynamic 
priority, c < p, c < d, and c,p,d G N, the set of nonnegative integers. |j 

Notationally, we let Tn be the universal set containing all possible tasks in a sys- 
tem Ti. We model the behavior of clients with timed automata which are automata 
enhanced with clocks. It is assumed that the current mode of each client is broadcast 
to all the clients in the same system. The behavior of a client in each mode can be ex- 
pressed through a state predicate, which is a combination of propositions and timing 
inequalities on clock readings. Given a set of propositions P and a set of clocks X, a 
state predicate rj of P and X has the following syntax. 

r] ::= false \ r\ x^a\x + a\^y + a2 | ?7i A 772 

where r G P,x,y G X, a, a\, and 02 are rational numbers, ~ G {<,<,=,>,>}, and 
771, 772 are state predicates. Let B{P, X) be the set of all state predicates on P and X. 
Given a set of propositions P and a set of clocks X, a client is modeled as follows. 

Definition 2 . .■ Client Automaton (CAj 

A Client Automaton (CA) is a tuple C = (M, mP, P, X, x, p, E, p, r) with the follow- 
ing restrictions. 

• M is a finite set of modes. 

• mP G M is the initial mode. 

• P is a set of atomic propositions. 

• X is a set of clocks. 

• X ■ ^ B{P, AT) is a function that labels each mode with a condition true in that 
mode. 

• /i : M I— > 2’^'^ maps each mode to a finite subset of tasks in T-h- 

• E C M X M is the set of transitions. 

• p : P I— > 2^ maps a transition to a set of clocks that are reset on that transition. 

• T : E 1-^ B{P, X) maps each transition to a triggering condition. || 

The CA C starts execution at its mode rrp. We shall assume that initially, all clocks 

read zero. In between transitions, all clocks increment at a uniform rate. The transitions 
of the CA may be fired when the triggering condition is satisfied. 

Definition 3 . .■ Servers 

A server is a tuple {a, (j>) where a is the unique identification for the server and <j) is the 
scheduling policy of the server. | 

Now with a set of servers, a set of client automata, and the ratio between the 
schedulability-check time unit and the job-computation time unit, we are ready to define 
what is a scheduling system. 

Definition 4 . .■ Scheduling systems 

A scheduling system H is defined as a tuple {{Si,S2, ■ • ■ , 5 '^}, {Ci, C2, . . . , G„}, 
P, X, P), where {Si, 82, - ■ ■ , S'™} is a set of servers, (Ci,C2, ■ ■ ■ , G„} is a set of 
client automata, P, and X are respectively the set of atomic propositions and the set 
of clocks used in Ci, . . . , Cn, and P is a ratio of a schedulability-check time unit to a 
job computation time unit. || 
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Definition 5. .■ States and their admissibility 

Given a system H = {{Si, Sm}, {Ci, C„}, P, X, P) with Q = (M*, P, 
X, Xi, tJ^i, Ei, Pi, Ti), a state s of is defined as a mapping from {1, n} U P U X to 
Ui<i<n [true, false} U such that 

• Vz € {1, n}, s(z) € Mi is the mode of Ci in s; 

• \/r G P, s{r) G [true, false} is the truth value of r in s; and 

• \/x G X, s{x) G i?“*" is the reading of clock x in s. 

Further, a state s is said to be admissible when: 

• s h ^l<^<nXi{s{^)), and 

• the task set Ui<i<mMi('S(*)) G Tn at s is schedulable by the servers. || 

Definition 6. .■ Satisfaction of state predicate by a state 

State predicate zy is satisfied by a state s, written as s |= ry iff 

• s ^ false', 

• Vr G P, s ^ r iff s(r) = true', 

• Vx G X, s 1= a; ~ a iff s(x) ~ a; 

• Vx, y G X, s 1 = X + ai ^ y + tt2 iff s(x) + ai ~ s(y) + 02; and 

• s ^ zyi A 772 iff s h ^1 s |= 772 II 

Definition 7. .■ Mode Transition 

Given a system Ti. = ({S'!, . . . , S^}, {Ci, . . . , C„}, P, X, P) with C\ = (M*, m°, P, 
X, Xi, pti, Ei, Pi, Ti), and two states s and s' , there is a mode transition from s to s' in 
Ti, in symbols s s', iff 

• both s and s' are admissible states, 

• there is an 1 < z < tz such that 

- (s(z),s'(z)) G Ep, 

- s{i) h p(s(z),s'(z)); 

- for all 1 < j < zz and j i, s(j) = s'(j); 

- HxGX {{xGpi{s{i),s'{i))^s'{x) = f))f\{x^Pi{s{i),s'{i))^s'{x) = s{x))). 

II 

Given a state s and a 5 G , we let s + (5 be the state that agrees with s in every 
aspect except for all x G X, s(x) + S = (s + S)(x). 

3.1 Semantics of Scbedulability Checking 

The admissibility of a new state, that is, the scbedulability check, can be implemented in 
either one of two ways: (1) checking before transition, and (2) checking after transition. 
In the former case, when a client is in a particular mode (may be executing some tasks) 
and an out-going transition is enabled, it must first check with the servers by sending 
scheduling requests before the out-going transition is taken. In the latter case, when a 
client is in a particular mode and an out-going transition is enabled, the client may take 
the transition and then check if the tasks in the new mode are schedulable. 

(a) Scheduling-Check Before Transition (SCBT): The semantics here differ mainly in 
the duration time that a client automata can stay in a mode. Here, we propose three 
possibilities. 
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• (SPR) Saturated Parallel Request; If more than one out-going transitions of the 
currently executing mode are concurrently enabled, then the client keeps on issu- 
ing scheduling requests to all servers according to its next-states task sets. Once 
a positive response is back for any one next-state, the client can make the corre- 
sponding transition. If more than one positive responses are received, the client 
makes all corresponding transitions in parallel (parallelism is implemented as in- 
terleaving of transition sequences). In this semantics, the duration time of a mode 
must be greater than the schedulability-check computation time for the correspond- 
ing next state. This semantics needs minimal modification to translate to HyTech 
input form. 

• (SQR) Sequential Request: The client nondeterministically chooses a next state 
and posts requests to the servers specified in the task set of the next state. No re- 
quest to any server will be issued until the last request is replied. In this seman- 
tics, the client nondeterministically chooses a next-state and polls the servers for 
schedulability-check. Only after response is back, the client may test for another 
next-state. Thus the duration time a client can stay in a mode must be the sum of a 
sequence of schedulability-check computation times. 

• (NPR) Non-saturated Parallel Request: The client polls all the servers for all its 
next-states. Once a reply is back, the client choose between taking the correspond- 
ing transition or not. If it does not transit at the moment, then it issues another 
schedulability request for the same next-state. In this semantics, the duration time 
must be at least a multiple of the schedulability-check computation time for a par- 
ticular next- state. 

(b) Scheduling Check After Transition (SCAT): In this case, modularity of the system 
specifications is preserved and transitions occur according to the timed automata se- 
mantics. Scheduling systems implemented using this scheme of schedulability check- 
ing have two semantics related to scheduling, that is strict scheduling semantics and 
loose scheduling semantics. 

• (SSS) Strict Scheduling Semantics: In a particular mode, it may happen that the 
specified task set cannot be scheduled before an out-going transition is enabled. In 
this situation, when we do not allow the client to make the enabled transition from 
the non-scheduled mode, we call it strict scheduling semantics. 

• (LSS) Loose Scheduling Semantics: In a specific mode, when the specified tasks 
are not scheduled (i.e., schedulability-check returns negative response) before an 
out-going transition is enabled, the client may choose to either keep on issuing 
scheduling requests for non-scheduled tasks set or transit to the next mode by mak- 
ing the enabled transition. This is called loose scheduling semantics and results in 
a larger global state space as shown by examples in Section 4. 

The computation of our scheduling system is defined in the following. 



Dehnition 8. .■ s-run 

Given a system Ti and a state s of , a computation of TT starting at s, called an s-run, 
is a sequence ((si, ti), (s 2 , ^ 2 ), ) of pairs such that 

• s = si; and 

• for each t G TZ'^, 3j S Af such that tj > f; and 
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• for each integer j > 1, for each real 0 < i5 < fj+i — tj, Sj is admissible and 

■Sj h Xi(sj(*)); and 

• for each j > 1,'H goes from Sj to s^+i because of 

- mode transition, i.e. tj = fj+i A Sj sy+ii or 

- time passage, i.e. tj < tj+i A Sj + fj+i — tj = sy+i. 

• The duration time a client can stay in a mode must satisfy the chosen semantics. | 

4 Implementation 

The theoretical framework of a Client Server Scheduling System Model as presented in 
Section 3 has been implemented into a practical tool for verifying scheduling systems. 
The implementation mainly constitutes two parts: scheduling check time computation 
and translating a scheduling system description into a pure timed automata specifica- 
tion. The resulting timed automata specification can be seen as a special case of linear 
hybrid automata, hence the popular tool called HyTech is used for verifying our resul- 
tant system descriptions. 

The two semantics of scheduling check before and after transition have both been 
implemented into our translator tool. Experiments have been conducted with several 
application examples from both hardware and software. Though the degree of advantage 
in using our proposed approach for verifying scheduling systems vary, yet all of the 
examples show an appreciable amount of decrease in the size of the reachable state 
space required for verification. 

4.1 Scheduling Check Time Computation 

Before entering a state s, a system must check with the servers if it is an admissible 
state, that is, if all the tasks (LJ7^]^/ii(s(z))) in that state are schedulable by the servers. 
This computation for schedulability check is done exclusively by each client by locking 
the servers and requires a small period of time which depends on the scheduling algo- 
rithms used by the servers. Usually the computation for schedulability check is a very 
small one compared to the scheduled job computation time. However, as the number of 
contending processes increases, some scheduling policies may consume an amount of 
time that cannot be considered negligible. For example, for a 200 MHz CPU, the pro- 
cessor cycle time is around 5 x 10“® seconds, and considering a single instruction to be 
2 cycles, the CPU requires only 10“^ seconds for one processor operation. At the same 
time, one tick of scheduled job computation time in a real-time system is usually in the 
order of a millisecond (ms). Hence, the ratio of a server cycle time to a job computation 
time unit is 10“^. Normally, a task set size in a real-time system is in the order of 10 to 
100. A schedulability-check time linear in the size of the task set would be negligible 
compared to the computation time of the task set, but if it is quadratic, it would be in 
the order of one job time unit. 

For analyzing the amount of time required for schedulability-check, we define the 
set of tasks in some state s, which are to be scheduled on some particular server Sk 
using some scheduling algorithm Rk, 1 < k < m. 

i^s(Rk) = {(l)\(l>= {Sk,c,p,d, f),(j> G I <i<n} (5) 
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Fig. 2. Schedulability Check Time 



The schedulability check time required for each of the five variations of priority 
scheduling described in section 2 using different ratios of server operation time to com- 
putation time unit is illustrated in Fig. 2. We observe that the schedulability check time 
is negligible when the ratio is of the order of (10“^). We make the following assump- 
tions: 

• The execution of all jobs of each task start at integer- valued time instants. 

• A schedulability check is assumed to take 1 computation time unit when it is not 
greater than 1 unit and it is taken as 2 time units when it is between 1 and 2, that is, 
the schedulability check time is taken as the next larger integer if it is not already 
an integer. 

As far as RM-safe, RM-arbitrary, and EDF priority scheduling are concerned, the 
schedulability check time only depends on the total utilization of a server. As long as 
the utilization is below the respective bounds of n(2^/” — 1), Alog^ 100%, 

all tasks of all phasings, request periods, and deadlines can be scheduled. This check 
requires time linear in |j/s(i?)|, where R is RM-safe, RM-arbitrary, and EDF, respec- 
tively. Hence, assuming the ratio of a processor operation time to a job computation 
time unit to be top, for example, top is in the order of 10“"^ for a 200 MHz CPU and 
a 1 ms tick computation time, the time spent for checking schedulability of RM-safe, 



RM-arbitrary, and EDF by a particular server are as follows. 

Js{RMsafe) = \iys{RMsa /e) I ^ ^op (6) 

Js(RMarb) = \iys{RMarb)\ X top (7) 

j,{EDF) = \iy,{EDF)\xtop ( 8 ) 

For RM-exact (Equation 1) scheduling, the schedulability check time is as follows. 

'JsiR^exact) — Ws{R^exact)\ ^ P\i's{RMexact)\ ^ ^op (9) 
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is the largest period in Vs{RM exact)- As for mixed priority schedul- 
ing, the schedulability check time is as follows. 

7,(MP) = |t^,(MP)| X LCM{pfc I <^= (^,c,pfc,d,l)} X top (10) 

where LCM is the least common multiple and c,pk,d,l) is a task in state s, which 
is to be scheduled using dynamic priority. 

Hence, the total time spent on schedulability check during a state transition to 
a state s in a system H = ((S'!, S' 2 , . . . , S'^), (Pi, P 2 , • ■ • , i?m), (C*!, <^ 2 , ■ • • , C„), 
P, AT, Tujtop) is as follows. 

7s = Maxi<fc<^7s(Pfe) (11) 

where Rk G {RMsafe, RMexact, RMarb, EOF, MR}. 

The difficulty in implementing scheduling check time computation lies in the fact 
that for RMexact and MP scheduling policies, the periods of all the currently executing 
tasks must be known (refer to Equations (10) and (11)), hence we must consider all 
possible permutations of the client modes for a complete check time computation. This 
also implies that the mode status of each client must be broadcast to all the other clients 
in the scheduling system. This broadcast has been implemented in our translator. 

4.2 Translator 

We developed a translator for translating the client-server scheduling system specifica- 
tion (in our own input language) to the HyTech specification. Although a scheduling 
system can be specified using the HyTech input language, yet the specification would 
be very lengthy, tedious, and error-prone. Using our input language, the specification is 
short and compact and the translation is done systematically, thus avoiding any human- 
errors. For example, in the real-time operating system example (described in subsec- 
tion 4.3), using our input language the specification consisted of only 12 modes and 17 
transitions, whereas the resulting translation into HyTech input language consisted of 
58 modes and 416 transitions. Thus, the translator is a necessity for verifying scheduling 
systems. 

HyTech [HHWT95] is a popular verification tool for verifying systems modeled as 
linear hybrid automata. HyTech has been used to verify various different systems such 
as gas burner, railroad crossing controller, Corbett’s distributed controller, and protocols 
such as Fischer’s mutual exclusion protocol. Each client automaton is implemented as 
a linear hybrid automaton in HyTech and the analysis tool is used to verify our system. 

According to the different scheduling semantics, we have different types of imple- 
mentation schemes. For the scheduling check before transition (SCBT) semantics, we 
have a transition-oriented implementation and for the scheduling check after transition 
(SCAT), we have a mode-oriented implementation. 



SCBT Implementation As illustrated in Fig. 3, each mode in the scheduling system 
description is implemented as a simple job execution location called RunJobs, but each 
mode transition is implemented as a set of three interconnected locations called Lock, 
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Fig. 3. Scheduling-Check Before Transition Implementation 



SchedCheck, and Error. The purpose of these locations are, respectively, the locking of 
the servers, the checking for schedulability of the tasks in the next mode (the destination 
mode of the transition under consideration), and the resetting of internal variables when 
a negative response is received in SchedCheck. There is a location transition from Run- 
job to Lock and one from SchedCheck (on a positive response from the servers) to the 
RunJobs location of the destination mode. The triggering condition and the assignment 
statements of the transition under consideration are attached to the location transition 
from RunJobs to Lock and to the location transition from SchedCheck to next mode 
RunJobs location, respectively. The locking mechanism is similar to that for SCAT and 
is described in the SCAT implementation. Saturated parallel request (SPR) of SCBT 
requires the least modification with respect to HyTech, so only SPR was implemented. 



SCAT Implementation Each mode of a client automaton is implemented by four loca- 
tions, namely. Lock, SchedCheck, RunJobs, and Error. A client must check the admissi- 
bility of a mode before entering it, and this check must be done exclusively by a client 
because otherwise the schedulability check performed will not be consistent. To ensure 
exclusiveness of schedulability check, we employ a simpler version of the Fischer’s 
mutual exclusion protocol [L87] and a lock (1) semaphore variable. Before performing 
schedulability-check, a client obtains ownership of I by setting I to its identification 
number so that it can exclusively do the checking. A client waits in location Lock for I 
to be free (i.e., I = 0) and when free, it sets I to its identification number. Schedulability 
check is done in the location SchedCheck if I is still set to its own identification number, 
otherwise the client returns to the location Lock. After schedulability check, the client 
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Fig. 4. SCAT: SSS Implementation Fig. 5. SCAT: LSS Implementation 



changes mode in location SchedCheck and if schedulable the jobs of the scheduled tasks 
are executed in location RunJobs, otherwise the location Error is entered. 

As illustrated in Fig. 4, in the case of strict scheduling semantics (SSS), the loca- 
tion Error has only one out-going transition to location Lock since the tasks must be 
scheduled and executed in the mode before any mode transition occurs. The location 
transition from RunJobs to Lock implements a mode transition in the client automaton. 
As illustrated in Fig. 5, in the case of loose scheduling semantics (LSS), the location Er- 
ror has two out-going transitions: one to the location Lock of the current mode (just as 
in the case of strict scheduling semantics) and one to the location Lock of the next mode. 
The two location transitions from location Error to the two Lock locations implement a 
mode transition. 

4.3 Application Examples 

To illustrate the generality of our approach, we demonstrate the benefits of three dif- 
ferent types of systems: a hardware system such as a video-on-demand (VOD) system, 
a software system such as a real-time operating system (RTOS), and an agent system 
such as a package delivery system (PDS). 

There are two servers in the video examples (just as in Fig. 1). The movie server 
schedules tasks with the rate-monotonic (safe) policy, while the commercial server does 
so with the earliest-deadline first policy. For the real-time OS example, there are four 
servers: OS kernel, display, memory, and printer, which use rate-monotonic (safe). 
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Table 1. Comparison of Pure Model Checking and Our Approach 



Example 


Specifications 


Number of regions (convex predicates) 


|S| 


|C| 


1 UMil 


\UEi\ 


SCBT 


SCAT 


SPR 


SSS 


LSS 


BBH 


Smc 


m 


Pmc 


Smc 


m 


BBH 


Smc 


m 


VOD(Fig. 1) 


U 


H 


8 


9 


139 






110 


68 


61.8 


120 


78 




VODl 


2 


3 


6 


7 








141 


92 




147 


98 




VOD2 


2 


3 


9 


11 


114 


46 


40.3 


80 


34 


42.5 


107 


61 


57.0 


RTOSl 


4 


3 


11 


14 


2962 


2247 


75.9 


1980 


1486 


75.0 


2054 


1560 


75.9 


RTOS2 


4 


3 


12 


16 


830 


306 


36.9 


684 


256 


37.4 


728 


300 


41.2 


PDSl 


3 


4 


6 


6 


4717 


3708 


78.6 


2114 


1610 


76.1 


2140 


1636 


76.5 


PDS2 


3 


4 


6 


6 


3306 


1554 


47.0 


1193 


536 


44.9 


1204 


547 


45.4 



S: set of servers, C: set of clients, | U Mi\: total #modes, | U Ei\: total transitions 
Pmc- Pure Model Checking, Smc'- Scheduling System Model Checking, %: Smc / Pmc 



earliest-deadline first, rate-monotonic (arb), rate-monotonic (exact) policies, respec- 
tively, for scheduling the tasks. For the delivery system example, it is assumed that there 
are three delivery agents and four clients. The delivery agents must deliver packages to 
the clients according to scheduling policies: rate-monotonic (exact), earliest-deadline 
first, and mixed scheduling. 

Two versions are given for each of the three kinds of systems. All the six examples 
were specified in our input language which was then automatically translated by our 
translator into the HyTech input language. The results, as tabulated in Table 1, show that 
our approach indeed reduces the total size of the system state space for verification as 
compared to the pure model checking approach. Here, pure model checking means that 
we do not take advantage of the scheduling algorithms and directly verify the systems 
which might contain a lot of unschedulable states. Drastic reductions can be achieved in 
systems that have a heavy workload. With each type of example, either VOD or RTOS, 
it is observed that with a high complexity in the client automata (i.e., the number of 
modes and transitions) the SCBT implementation shows a larger benefit (i.e., a smaller 
state space size) compared to all the semantics of the SCAT implementation. This is due 
to the stronger semantics of a transition not occuring before the tasks schedulability of 
its destination mode is checked. Comparing the two semantics of SCAT: SSS and LSS, 
in all the examples it is observed that strict semantics shows a larger benefit with our 
approach. This is due to the stronger restriction in SSS of tasks required to be scheduled 
before the client can progress on. Thus, we can conclude that both theoretically and 
experimentally we have shown that SCBT has the strongest notion of schedulability 
and LSS of SCAT has the weakest notion with SSS of SCAT in-between SCBT and 
LSS. 

5 Conclusion 

Model-checking, though a popular verification method, has yet to be made more effi- 
cient for verifying the current highly complex systems. We have shown how complex 
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real-time systems can be easily verified using the popular model-checking approach if 
we model the complex system as a client-server scheduling system and then verify it. 
This approach is meaningful when we observe that almost all complex systems need 
some sort of scheduling so that the tasks can he executed consistently and efficiently. 
Our preliminary effort has been shown feasible through the implementation using our 
translator and the HyTech verification tool. Different semantics have been implemented 
and compared using several examples. Our future work will include the development of 
a tool devoted to the verification of such systems using symbolic model checking. 
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Abstract. Digital controllers found in many industrial real-time sys- 
tems consist of a number of interacting periodic tasks. To sustain the 
required control quality, these tasks possess the maximum activation pe- 
riods as performance constraints. An essential step in developing a real- 
time system is thus to assign each of these tasks a constant period such 
that the maximum activation requirements are met while the system 
utilization is minimized [3]. 

Given a task graph design allowing producer/consumer relationships 
among tasks [4], resource demands of tasks, and range constraints on 
periods, the period assignment problem falls into a class of nonlinear 
optimization problems. This paper proposes a polynomial time approx- 
imation algorithm which produces a solution whose utilization does not 
exceed twice the optimal utilization. Our experimental analysis shows 
that the proposed algorithm finds solutions which are very close to the 
optimal ones in most cases of practical interest. 



1 Introduction 

Real-time systems often consist of a number of interacting tasks. Most of these 
tasks execute periodically at fixed rates reading data from producer tasks and 
writing data to consumer tasks. A typical example is a feedback control system. 
A control task periodically samples input data, computes control laws, and gen- 
erates output for actuators, and all these activities should be done periodically. 

In order to sustain the required quality of control, each real-time task pos- 
sesses as its timing constraint the maximum activation period that is derived 
from a given performance specification. An essential step in developing a real- 
time system is thus to assign each task a constant period such that the maximum 
activation requirements are met while the system utilization is minimized. Where 
intermediate tasks are shared by several others that run with different rate con- 
straints, the problem falls into a class of nonlinear optimization problems. 

* The work reported in this paper was supported in part by Engineering Research 
Center for Advanced Control and Instrumentation (ERC-ACI) under Grant 96K3- 
0707-02-06-3 and by KOSEF under Grants 97-0102-05-01-3 and 981-0924-127-2. 
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In [3] Gerber et al. addressed this problem as part of the real-time system 
design problem and formulated it into a nonlinear optimization problem in their 
design methodology. They proposed a set of heuristics which can be used to 
reduce the search space of the problem. Unfortunately, these heuristics possess 
only limited utility, since they may still leave a inhibitively large search space 
even for a problem with a modest size. To the best of our knowledge, no polyno- 
mial time approximation algorithm with a known performance bound has been 
proposed to address this period assignment problem. 

A similar problem was addressed by Seto et al. [9] for the design of real-time 
control systems. Given that the control performance requirement and the schedu- 
lability constraint determine the ranges of sampling periods, their proposed algo- 
rithm derives schedulable sampling periods that optimize the performance index 
of the control system. However, the application model in [9] is rather simple: in 
[9], all tasks in a system run independently, while the task graph in [3] allows 
producer/consumer relationships and data sharing between tasks. 

In this paper, we propose a polynomial time period assignment algorithm 
for the real-time design problem formulated in [3]. Our algorithm makes use of 
two effective heuristics, namely (1) optimal GGD assignment; and (2) harmonic 
period assignment among tail tasks. In the first heuristic, a producer task is as- 
signed as its period the greatest common divisor of the periods of its consumers. 
In the second, tasks with no consumers (tail tasks) are assigned harmonic peri- 
ods. We formally prove via the worst case analysis that the proposed algorithm 
with these heuristics yields a solution whose utilization does not exceed twice 
the optimal utilization. In reality, our experimental analysis shows that the algo- 
rithm produces solutions which are very close to the optimal ones in most cases 
of practical interest. 

This paper is organized as follows. In section 2, we give a formal specification 
of the period assignment problem by defining a task graph design of a real-time 
system, along with the constraints and the objective function of the problem. In 
section 3, we propose two heuristics and present the approximation algorithm. 
In section 4, we make a formal analysis of the proposed algorithm and prove 
its worst case performance bound. In section 5, we experimentally evaluate the 
algorithm by comparing it with an optimal algorithm using exhaustive search. 
Section 6 concludes the paper. 

2 Problem Formulation 

A real-time system in our problem is represented by a directed acyclic graph 
G(V, E) where U is a set of tasks {pi,P 2 , ■ ■ - Pn} and if is a set of directed edges 
between two tasks. Each task has a bounded execution time Cj which can be 
estimated in various ways [7]. An edge pi pj denotes a producer /consumer 
relationship in that pi produces data that pj consumes [4] . 

Let Ti be the period of pi. Each producer/consumer pair {pi,p/) is con- 
strained to have harmonic periods such that Tj is an integer multiple of Ti, 
which is denoted “Ti\T/\ Figure 1 shows an example task graph with their 
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assigned harmonic periods. If these tasks ran with arbitrary periods, task execu- 
tions would get out of phase resulting in large latencies in communication. The 
harmonicity constraint ensures that the two tasks stay “in-phase,” thus reduc- 
ing communication latencies [3] . It also guarantees predictable data flow from a 
common producer to multiple consumers. For example, in Figure 1, it is guar- 
anteed that p 4 receives every data item generates while p^ gets every third 
item of what P 3 produces. In the remainder of the paper, we use the following 
additional notations. 



notation 


description 


^ tail 


A set of tasks having no outgoing edges. 


' tail 


A set of tail tasks pj which are reachable from pi. 


rpl 

' succ 


A set of tasks having an incoming edge from pi. 



Every output task pi G Vtaii possesses a range constraint on Ti such that 
1 < Ti < T“ where T“ is the maximum period constraint which is derived from 
the performance requirement. Due to harmonicity constraints, each non-tail task 
Pj is also subject to a period constraint such that 1 < < T“ where T“ is the 

smallest maximum period of all consumer tasks. Therefore, T" of non-tail task 
Pj is determined such that T“ = min{T“ | pj G = min{T“ | pj G 

The objective function of this period assignment problem is to maximize the 
chance of the system being schedulable. As in [3], we adopt, as the objective 
function, minimization of utilization U = There have been a number 

of vastly different measures of schedulability depending on real-time scheduling 
algorithms such as nonpreemptive, calendar-based scheduling [11], and preemp- 
tive, static and dynamic priority scheduling [6]. In this paper, we assume a pre- 
emptive priority scheduling strategy. Note that in preemptive priority scheduling 
such as RMS and EDF, utilization is proven to be a sufficient measure of the 
schedulability of real-time systems [6,5]. 

Finally, the problem at hand is stated as follows. 

Given a task graph and the range constraints on task periods, assign 
each task a harmonic period such that the range constraints are met and 
the system utilization U is minimized where U = r •” 



3 Period Assignment Algorithm 

Due to the harmonicity constraints and the objective function, the problem is 
nonlinear optimization problem. To find solutions in a reasonable amount of 
time, we propose an approximation algorithm which consists of two heuristic 
steps to minimize the utilization. 
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Fig. 1. Task graph. 



3.1 Optimal GCD assignment 

Our first heuristic is the optimal GCD (greatest common divisor) assignment. It 
is a backward period assignment method in that a non-tail task pi gets period Ti 
such that Ti = GCD{Tj \ pj G Vlucc}- Such period assignment starts from those 
non-tail tasks that are the immediate predecessors of tail tasks, and is iteratively 
applied to their predecessors until all non-tail tasks are assigned their periods. 
As a result of the optimal GCD assignment, a producer task always gets the 
largest possible period which is harmonic to the periods of its consumer tasks. 

Theorem 1 proves that the optimal GCD assignment is a necessary step to 
obtain the optimal period assignment of a whole system. 

Theorem 1. If all tail tasks are given the optimal periods, the GCD assignment 
always finds the optimal period assignment for the whole tasks. 

Proof. We prove the theorem by contradiction. Suppose that the optimal solu- 
tion has an intermediate task pi which has a non-GCD period Tj. Let = 

GCD{Tj I Pj e PJucc}- Due to the harmonicity constraint, Ti is a common divi- 
sor of the periods of tasks in thus Ti\TC^^ . Hence, Ti can be replaced 

by T^'^^ without affecting the periods of other tasks, and this yields a lower 
utilization. This contradicts the assumption. Therefore, the optimal solution is 
obtained by the GCD assignment method. □ 

Theorem 1 helps significantly reduce the problem size, since it allows us to 
focus only on tail tasks for the optimal period assignment. 

3.2 Harmonic Period Assignment for Tail Tasks 

Our second heuristic is the harmonic period assignment for tail tasks. In order 
to minimize the utilization, it is desirable to assign tail tasks the largest possible 
harmonic periods so that large GCD values are assigned to non-tail tasks. To do 
so, the algorithm introduces additional harmonicity constraints among the tail 
tasks. It then assigns tail tasks harmonic periods to satisfy this extra constraints. 
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This heuristic step works as follows. First, tail tasks are sorted in nonde- 
creasing order of T“. Second, a harmonicity constraint is established on each of 
the two adjacent tasks in the list. We can represent this additional harmonicity 
constraint by slightly modifying a given task graph. Figure 2 shows a modified 
task graph of Figure 1 after it is extended with extra harmonicity constraints. 



g Q h @ 




Fig. 2. Modified task graph with harmonicity constraints between tail tasks. 



Due to the harmonicity constraints between tail tasks, the original problem 
is reduced into the problem of assigning a period to only the tail task with the 
smallest T“. 

The entire algorithm is presented below. In Step (1), it sorts tail tasks in the 
nondecreasing order of their maximum period constraints. In Step (2), the algo- 
rithm chooses a period value for the tail task with the smallest maximum period 
constraint. In order to bound the worst case performance of the algorithm, it is 
required to choose T\ between [T“/2] and T“. This requirement will be proved 
shortly in the subsequent section. In Step (3), the algorithm assigns the largest 
harmonic periods to other tail tasks according to the extra harmonic constraints 
imposed on them. Finally, in Step (4), it performs GCD period assignment for 
non-tail tasks. 

Period Assignment Algorithm 

{ 

(1) Let {pi,p 2 , ■ ■ ■ ,Pm) be a sorted list of 'Pta^l 

such that T“ < < . . . < T^. 

(2) Choose any Ti such that |"T’“/2] < Ti < T". 

(3) Ti = • 7j-i for 2 < i < m. 

(4) Perform GCD assignment for non-tail tasks. 

} 
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4 Algorithm Analysis 



In this section we make the worst case analysis of the proposed algorithm and 
summarize it in Theorem 2. It states that the algorithm, even in the worst case, 
yields a solution whose utilization does not exceed twice the optimal utilization. 



Theorem 2. Let Uaig he utilization computed by the algorithm, and Uopt the 
optimal utilization. Uaig is always less than 2Uopt- 

Proof. Let {pi,p 2 , ■ ■ ■ ,Pm) be a sorted list of Vtaii- Suppose that {Ti, , Tm) 
is a period assignment for tail tasks. Due to Theorem 1, the period of non-tail 
task in V can be represented with some combination of {T\, . . . ,Tm\. Thus, 
utilization U can be generally written as follows. 



U = 



£l 

Ti 



62 

-t2 

^m+2 



GCD{T2,n) 

62^-1 






^m+1 



GCD{n,T2) 



GCD{Ti,T2,...,Tm) 



( 1 ) 



where G = tasks pj which has a path to every tail 

task appearing in the denominator, but to no other. Clearly, if Qi is empty, 
6j is zero. 

For the analysis of the algorithm, the lower bound Uiow of U is derived from 
Eq.( 1) by replacing its denominators with maximum possible periods, as follows. 



Ui 



ow 



6l 62 

'T'u 'T'U 

^2 




Cm+l e2»T^_l 

'T'u 'T'U ’ ' ' 'T'U 

^1 ±2 li 



(2) 



Since the proposed algorithm imposes the harmonicity constraints on Vtaii, 
GGD{Ti, . . . ,Tj) = Ti for 1 < i < j < m. Thus, utilization Uaig computed by 
the algorithm can be derived from Eq.(l), as follows. 



Uaig — 



£l 

Ti 




Tm Ti 



e-m+2 62 m-l 

■■■ A 



(3) 



Now Eq.(2) and Eq.(3) are compared term by term. Let and m*®*" be the 
terms in Uaig and Uiow, respectively. 

(a) For i = 1: < 2m-°“, since Ti is assigned a period no less than T“/2 in 

the proposed algorithm. 
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(b) For2<i<2^-l and ^ 0: 

J'U 

^ ^ \T^lT,-i\ ■ T,_1 

for some 1 < j < m. Since -j^ < 2 for a: > 1, 

The above comparison leads to Uaig < 2Uiow Let Uopt be the optimal utilization, 
then it is obvious that Uiow < Uopt- Combining these, we have 

Ualg ^ 2Uopt- 



This proves the theorem. □ 

For a task graph with n sorted tasks (tail tasks are sorted in nondecreasing 
order of T" and non-tail tasks are topologically sorted), 0{n) period assignment 
steps are required for the proposed algorithm. Note that the GCD assignment 
step in the algorithm is simplified through mathematical manipulation shown in 
Eq.( 3). For a GCD assignment, GCD{Ti, . . . , Tj) = Ti can be used. 

5 Empirical Performance 

From the previous discussion, we know that the performance ratio of the pro- 
posed algorithm to the optimal one is less than 2.0. However, this result is derived 
via an analysis made for the worst case. In this section, we perform an empirical 
study to show that the performance of the algorithm is very close to the optimal 
one in most cases of practical interest. 

To do so, we have implemented both the proposed algorithm and an optimal, 
exhaustive search algorithm. Then, we have generated artificial workloads using 
five types of representative task graphs that are commonly encountered in algo- 
rithmic structures of control applications. The task graph structures considered 
here are in-tree, out-tree, fork-join, Laplace equation solver [10], and FFT (fast 
Fourier transform) [2] types. Figure 5 pictorially shows these task graph types. 
For each of them, we generated three distinct task graphs by varying the num- 
ber of tasks and their maximum period constraints. For the in-tree, fork-join, 
and Laplace equation solver task graphs that possess only one tail task, we have 
added two or more tail tasks to the task graphs to prevent the algorithms from 
generating trivial period assignments. Due to the inherent time complexity of the 
exhaustive search algorithm, the experiments were carried out with small-sized 
problems possessing 10 to 20 tasks. 

To generate the workloads, we varied task execution times and maximum 
period constraints for each task graph type. For the maximum period constraints 
T^“ for tail tasks, we used the normal probability distributions with 

mean fj, and standard deviation a. For each task graph type, three distinct test 
cases were generated using 7V(600, 300^), N(500,250^) and A^(400, 200^). Task 
execution times were generated using A^(10,5^) in all test cases. 
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(A) 



(B) 



(C) 





(D) 



(E) 



Fig. 3. Task graph types: (A) in-tree, (B) out-tree, (C) fork-join, (D) Laplace 
equation solver, and (E) EFT. 



Figure 4 summarizes the results of our experiments. As is clear from Figure 4, 
the proposed algorithm yields solutions which are very close to the optimal 
ones in all cases. On the average, the performance ratio is 1.0330 in our 
experiments. 



6 Conclusion 

We have presented a period assignment algorithm which requires 0{n) period 
assignment steps and is capable of finding solutions close to the optimal ones. 
We have formally proved that the proposed algorithm has a performance ratio 
< 2 and experimentally showed that it yields almost optimal solutions in 

practice. During the experiments, the performance ratio was = 1.0330 on 
the average. 

The proposed algorithm can be used as an essential component of the real- 
time system design methodology formulated in [3]. Since the methodology is 
currently applicable to a real-time system design built on top of a distributed 
platform, we are currently extending our algorithm for this purpose. 




42 



Minsoo Ryu and Seongsoo Hong 





(A) 



(B) 




(600, 300) (500, 250) (400, 200) 
Normal pdf of the maximum period 




(C) 



(D) 




(600, 300) (500, 250) (400, 200) 
Normal pdf of the maximum period 



(E) 



Fig. 4. Performance comparisons: (A) in-tree, (B) out-tree, (C) fork-join, (D) 
Laplace equation solver, and (E) EFT. 
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Abstract. Traditionally, real-time systems require that the deadlines of 
all jobs be met. For many applications, however, this is an overly strin- 
gent requirement. An occasional missed deadline may cause decreased 
performance but is nevertheless acceptable. We present an analysis tech- 
nique by which a lower bound on the percentage of deadlines that a peri- 
odic task meets is determined and compare the lower bound with simula- 
tion results for an example system. We have implemented the technique 
in the PERTS real-time system prototyping environment [6, 7]. 



1 Introduction 

A distinguishing characteristic of real-time computer systems is the requirement 
that the system meet its temporal constraints. While there are many different 
types of constraints, the most common form is expressed in terms of deadlines: 
a job completes its execution by its deadline. In a hard real-time system, all jobs 
must meet their deadlines and a missed deadline is treated as a fatal fault. Hence 
hard real-time systems are designed to ensure that there are no missed deadlines, 
often at the expense of resource utilization and average performance. Hard real- 
time systems are most often found in safety or mission critical applications. 

The last few years have seen the proliferation of applications known as soft 
real-time systems. Examples include telecommunications and signal processing 
systems. For these systems, missed deadlines result in performance degradation. 
However, provided that the frequency of missed deadlines is below some thresh- 
old, the real-time performance of such a system is nevertheless acceptable. While 
many techniques for designing and validating hard real-time systems exist, there 
are few such techniques for soft real-time systems. In this paper, we present a 
schedulability analysis technique for fixed-priority systems to determine lower 
bounds on the frequency of missed deadlines and compare the lower bound with 
simulation results for an example system. 

We begin, in the next section, with a brief review of schedulability analysis 
techniques for validating hard real-time systems, motivate the need for better 
techniques to analyze soft-real time systems and describe closely related work. 
In Section 3, we present the Stochastic Time Demand Analysis technique and 
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show how it allows a designer greater freedom in trading the certainty with 
which deadlines are met for other design goals. We compute a lower bound on 
the probability that deadlines are met by jobs in a simple system and compare 
the bounds with the percentage of deadlines met obtained by simulation. Sec- 
tion 4 briefly discusses issues which we discovered while implementing STDA in 
the PERTS real-time system prototyping environment and Section 5 discusses 
possible directions of future research. 



2 Background and Related Work 

The periodic task model [5] has proven useful in describing the characteristics of 
real-time systems. It is the foundation of state-of-the-art techniques for analyzing 
the behavior of hard real-time systems. According to this model, a real-time 
system consists of a set of tasks, each of which consists of a (possibly) infinite 
stream of computations or communications, called jobs. We denote the zth task 
of the system by Ti and the jth job of the task (or the jth job since some time 
instant) by Jij. The execution time of a job is the amount of time the job takes 
to complete if it executes alone. All the jobs in a task have a common minimum 
(maximum) execution time denoted E~ Moreover, the jobs are released 

for execution, (i.e., arrive), with a common minimum inter-release time. The 
minimum inter-release time (or inter-arrival time) is called the period of the 
task. The period of each task Ti is larger than zero and is denoted by Pi. A 
job Ji^j becomes ready for execution at its release time, rij. It must complete 
execution by its absolute deadline, dij, or it is said to have missed its deadline. 
Figure 1 shows these quantities in the context of a time-line. The length of time 
between the release time and absolute deadline of every job in each task Ti is 
constant. This length is called the relative deadline of the task and is denoted 
Di = dij — Vij. The completion time of Jij is denoted Cij and the response 
time is pij = Cij — rij. 

In this paper, we will have occasion to refer to the actual execution time of 
Jij which we denote Cij. The maximum utilization of the task is the ratio of 
the maximum execution time to the minimum inter-arrival time (period) and 
is denoted by Ui = A)*" /Pi. Finally, the release time of the first job in a task 
is called the phase of the task. We say that tasks are in-phase when they have 
identical phases. 




T; 






Fig. 1. Time-line for Task Ti 
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In modern real-time systems, tasks are scheduled in a priority driven man- 
ner. At any point in time, the ready job with the highest priority executes. If at 
time t, a job of a higher priority becomes ready, the executing job is preempted 
and the higher priority job executes. Most priority assignments are fixed prior- 
ity. According to a fixed-priority scheduling policy, all jobs in a task have the 
same priority. We denote the priority of task Ti and hence the priority of jobs 
Ji,i, Ji,2, ■ ■ ■ by 4 >i. For convenience and without loss of generality, we assume 
that priorities are distinct and arrange the tasks in order of decreasing priority 
Ti :< Tj+i such that T\ has a higher priority than T 2 , etc. 

2.1 Deterministic Schedulability Analysis Methods 

A task in a system is said to be schedulable if all jobs in the task meet their 
deadlines. A system of real-time tasks is schedulable if all tasks in the system 
are schedulable. One of the most commonly used fixed-priority assignments is 
Rate Monotonic (RM). According to this policy, the shorter the period Pi of a 
task, the higher its priority. It is shown in [5] that a system of n tasks scheduled 
on a RM basis is schedulable if the sum of the maximum utilizations of the 
tasks, denoted U, satisfies the inequality U < n(2" — 1). The expression on the 
right hand side of the inequality is often called the Liu and Layland bound. 
The Liu and Layland bound gives a sufficient and hence conservative condition. 
A system may be schedulable rate monotonically even though its maximum 
utilization exceeds the Liu and Layland bound. 

The Time Demand Analysis (TDA) method [4] provides a more accurate 
and general characterization of the ability of arbitrary fixed-priority systems to 
meet all deadlines. It is based upon the observation that the worst-case response 
time of a job occurs when it is released at a critical instant. For a system of 
independent preemptive periodic tasks scheduled on a fixed-priority basis, a 
critical instant of a task occurs when a job in each task is released along with a 
job from all tasks of equal or higher priority [5]. Therefore, to bound the worst 
case response time of all the jobs in a task Tj, it suffices for us to look at a 
job that is released at a critical instant. We call this job Ap. The time demand 
function of Ti, denoted Wi(t), is the total maximum time demanded by Jjp, as 
well as all the jobs that complete before Jjp, as a function of time t since the 
release of Ji It is a function which increases by the maximum execution time 
K every time a higher priority job Jk^i is released. If there is a t < Di such 
that Wi{t) < t is satisfied, then no job in Ti will miss its deadline. 

Figure 2 shows the time demand function for each of the tasks in Example ffl. 
The parameters of the tasks are listed in Table 1.^ There is sufficient time for 
tasks Ti, T2 and T 3 by 100, 200 and 600 respectively. A schedule of the system 
with the initial job in each task released at a critical instant is shown in Fig. 3. 
Even though the processor is idle from 1100-1200, it is clear that increasing the 
maximum execution time of any task will result in J 31 missing its deadline at 
600. 

^ We note that the maximum total utilization of the tasks is 0.92, greater than the 
Liu and Layland bound, which is 0.78. However, the system is schedulable. 
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Table 1. Parameters of the Tasks in Example #1. 



Ti 


(pi 


Pi 


Et 


Di 


Ui 


Ti 


1 


300 


100 


300 


0.333 


T2 


2 


400 


100 


400 


0.250 


Ts 


3 


600 


200 


600 


0.333 



Total 0.917 




Fig. 2. Time Demand Analysis of the Example System 



Ti 








Fig. 3. Schedule of the Example System 
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The version of TDA given above works only when all jobs will complete by 
the release of the next job in the task, which is the case for the example. To 
determine whether all jobs in Ti meet their deadlines when some job Jk,i+i may 
be released before the previous job Jk,i in a higher priority task Tk completes, we 
must compute the worst case bound on response times of all jobs in Ti executed 
in a level-i busy interval that begins at an instant when a job Ap in Ti is released 
at the same time with a job in every higher priority task.^ (A level-i busy interval 
is an interval of time which begins when a job in Ti or a higher priority task is 
released and immediately prior to the instant no job in those tasks is ready for 
execution. It ends at the first time instant t at which all jobs in Ti and higher 
priority tasks released before t have completed.) We call such a busy interval an 
in-phase level-4>i busy interval. 

Analogous to the critical instant analysis in [5] , it has been shown in [3] that 
it suffices for us to consider only an in-phase \eve\-(j)i busy interval. The reasons 
are 

1. if a job in Ti is ever released at the same time as a job in every higher 
priority task, that instant is the beginning of an in-phase busy interval (i.e., 
the system has no backlog at that instant), 

2. the length of an in-phase level-(/)i busy interval is longer than a level-^i busy 
interval that is not in-phase (and hence more jobs in Ti are released in the 
in-phase busy interval), and 

3. the response time of every job in a level-4>i busy interval that is not in phase 
is no greater than the response time of the corresponding job in an in-phase 
level-(/)i busy interval. 

For these reasons, if all jobs in an in-phase level-z busy interval meet their dead- 
lines, the task is schedulable [3]. Stochastic Time Demand Analysis described in 
Section 3 uses this generalization of TDA. 

We know from the above analysis that the system of tasks in Table 1 is 
schedulable. However, suppose that a significantly less expensive processor is 
available which is half as fast. The profitability of the product would be greatly 
enhanced if the slower processor could be used. Using the slower processor, the 
execution time doubles but the periods do not change because they are deter- 
mined by the environment. Thus the system utilization is doubled. The hard 
real-time analysis techniques discussed earlier tell us whether or not a deadline 
will be missed, but not how often. Although we may be willing to trade occa- 
sional missed deadlines for the use of the slower processor, we are unable to do 
so based on available hard real-time techniques. A different approach is needed. 

2.2 Probabilistic Approaches 

We are aware of only two other techniques that exploit information about the 
statistical behavior of periodic tasks to facilitate better design of soft real-time 

^ This instant is still called a critical instant in the literature but it is not the original 
definition of a critical instant since Ji,i no longer has the longest response time 
among all jobs in Ti. 
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systems: Probabilistic Time Demand Analysis (PTDA) [10] and Statistical Rate 
Monotonic Scheduling (SRMS) [1]. 

Like the proposed method, PTDA attempts to provide a lower bound on the 
probability that jobs in a task will complete in time. It is a straight forward 
extension to TDA in which the time demand is computed by convolving the 
probability density functions of the execution times instead of summing the 
maximum execution times as in TDA. PTDA assumes that the relative deadline 
of all tasks are less than or equal to their periods. It computes a lower bound 
on the probability that jobs in a task complete in time by determining the 
probability that the time supply equals or exceeds the time demand at the 
deadline of the first job in the task. This assumption is not valid, especially 
when the average utilization of the system approaches one. 

SRMS is an extension to classical Rate Monotonic Scheduling (RMS). Its 
primary goal is to schedule tasks with highly variable execution times in such a 
way that the portion of the processor time allocated to each task is met on the 
average. Variable execution times are “smoothed” by aggregating the executions 
of several jobs in a task and allocating an execution time budget for the aggregate 
(which may be proportional to the original). A job is released only if its task 
contains sufficient budget to complete in time and if higher priority jobs will not 
prevent its timely completion. All other jobs are dropped. The analysis given in 
[1] can only be used to compute the percentage of jobs in each task that will be 
released for execution (and hence complete in time). Moreover, it is applicable 
only when the periods of the tasks are related in a harmonic way, i.e., each 
larger period Pj is an integer multiple of every smaller period Pi. The method 
presented here seeks to provide a lower bound on the percentage of jobs which 
meet their deadlines when all jobs are released. It is not restricted to harmonic 
systems and the RM scheduling policy. 

3 Stochastic Time Demand Analysis 

In this section we describe an algorithm, called Stochastic Time-Demand Analy- 
sis (STDA), which computes a lower bound on the probability that jobs in each 
task will meet their deadlines. We also compare the bound with the average 
behavior of a system as determined by simulation. 

Consider the execution of a task Ti. Let Jij be the jth job in Ti released in a 
level-^i busy interval. To simplify the discussion and without loss of generality, 
we take as the time origin the beginning of this interval. The response time pij 
of job Jij is a function of the execution times of all jobs which can execute in the 
interval [rij,Cij). As in the deterministic analysis, we use the minimum inter- 
release time in our analysis. However, the execution times of tasks are random 
variables, hence the response time of each job in a task is a random variable. 
Our analysis assumes that the execution time Ei of a job in Ti is statistically 
independent of that of other jobs in Ti and jobs in other tasks. Again, because a 
job may not complete by the release of the subsequent job in the same task, we 
must consider all jobs in a level-^i busy interval, and note that the length of a 
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level-^i busy interval is also a random variable. Bounding the length of a level-^i 
busy interval is key to STDA. First we show how to compute the response time 
distribution of jobs in task T^. 

Let Wij(t) denote the time demand of all jobs that execute in the interval 
[rij,t). Job Jij completes when there is sufficient time to meet the demand 
Wij{t) = t. Let Wij{t) = V[wij{t) < t] denote the probability that the time 
demand up to t is met at t, given that the busy interval has not ended. We note 
that Wij{t) is also the probability that the response time of Jij is less than 
or equal to t. The probability that Jij meets its deadline is therefore at least 
Wij{Di). We now turn our attention to computing Wij{t). 

Consider a task Ti from the system. The response time distribution Wij{t) 
is computed by conditioning on whether or not a backlog of work from equal or 
higher priority tasks exists when Jij is released. If no backlog exists, a level-^i 
busy interval starts at the release of Jij (which we relabel J^p) and 



=iPKi(t) <t] . (1) 

Otherwise, the response time distributions for the remaining jobs of Ti in the 
busy interval are computed in order of their release by 

= V[wij{t) < 1 1 . ( 2 ) 

For the highest priority task, the response time distribution of the first job in 
a busy interval is the same as its execution time distribution. The response 
time distribution of the subsequent job in the busy interval is computed by 
convolving the execution time distribution of the task with the distribution of 
the backlog obtained by conditioning. This process continues until the end of 
the busy interval. Equations 1 and 2 are also used to compute the response time 
distributions of the remaining tasks in the system. 

We now compute Wij(t) for j > 1. Clearly jobs with a priority higher than 
4>i can execute in the interval [rij,Cij). Jobs among Ji^i, Ji^ 2 , ■ ■ ■ , Ji,j-i that 
complete after Vij also execute in this interval. Their effect is taken into account 
in the conditioning process. To compute we must still take into account 

the time demand of jobs of higher priority tasks released in the interval [rij, Cij). 
This is done by dividing [rij,Cij) into sub-intervals separated by releases of 
higher priority jobs and conditioning on whether a backlog of work exists at the 
start of each sub-interval. For example, suppose that only one higher priority 
job Jk,i is released in the interval dividing the interval into two sub- 

intervals, [rij,rk^i) and [rfc,/, Cij). The probability that Jij will complete by time 
t before is 



^ I (G,i) > Gj] > (3) 

i.e., for t in the first sub-interval and is 

1 1 > nj,Wij{rk,i) > Tk,i] 

V[w^J{rk,l) > Tk,i] , 



(4) 
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for t in the second sub-interval ri^+i). The probability that a job will 
complete by its deadline is determined by computing Wij(Di). Alternatively, 
the sub-interval distributions can be combined before Wij{Di) is computed. 

Equations 1 and 2 allow the response time distributions of jobs in a level- 
4 >i busy interval to be computed for any combination of initial release times 
{^i,i I 1 < * < n}. In order to compute a lower bound on the probability that jobs 
complete by their deadlines, the worst-case combination of release times needs 
to be identified. As discussed previously, an upper bound on the response time of 
jobs from Tj according to the deterministic TDA is obtained by computing the 
response times of jobs executed in an in-phase level-4>i busy interval. Sadly, we 
note that it is not longer sufficient for us to consider an in-phase busy interval. 
The proof that no backlog exists at the instant a job is released simultaneously 
with the release of jobs of higher priority tasks requires that the maximum total 
utilization of the system is no greater than one, which is the assumption of 
deterministic TDA. STDA requires only that the average total utilization of the 
system is less than one hence some systems may not meet the condition. It is not 
clear what relationship between the release times of the first jobs in a level-^i 
busy interval causes some job in Ti to have the maximum possible response time 
and hence the smallest probability of completing in time. For now, we assume 
that the first jobs in all tasks are released in-phase and discuss the rationale for 
this assumption later. 

We now turn our attention to the matter of determining when a busy interval 
ends. We note that since there is a single task per priority level, a level-^i 
busy interval ends if some job Jij in Ti completes before the next job Jij+i is 
released. Thus we know that the busy interval has surely ended if, for some j, 

'P[wi,j{nj+i) < Tij+i] = 1 . 0 .^ 

As an example, we now use STDA to analyze the behavior of a system of two 
tasks shown in Table 2. The execution time of each task is uniformly distributed 
(with parameters chosen to accentuate the potential for missed deadlines). The 
worst-case utilization of the system is 1.41 and the mean utilization of the system 
is 0.71. Consequently, we would expect that some jobs will miss their deadlines. 
To determine the probability of jobs in each of the tasks missing their deadlines, 
we apply the procedure outlined above. Because its maximum utilization is less 
than 1.0, we know that T\ will not miss any deadlines. Therefore we begin the 
analysis with T2- 

It is apparent that the maximum time demand of T2 in the interval [0,400) 
exceeds the time supply because the sum of the maximum utilizations of the 
two tasks exceeds one. Because J 2 ,i may not have complete by the time J2,2 
is released, the response time of J 2,2 may be greater than that of J 2 ,i- At the 
very least we need to compute the response time distributions for J 2 ,i and J 2 , 2 - 
To compute the probability that J 2 ,i completes by its deadline, the interval 

® When multiple tasks have the same priority, jobs from the same priority level must 
have their response time distributions computed in order of increasing release times. 
The busy interval will have ended if all jobs with equal or higher priority released 
before time t have completed by t with probability 1.0. 
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Table 2. Parameters of the Tasks in Example #2. 



Ti Pi Di E~ Ei E+ U~ Ui U+ 

Ti 300 300 1 100 199 0.0033 0.333 0.663 

T2 400 400 1 150 299 0.0025 0.375 0.748 



Total 0.0058 0.708 1.411 



[0,400) is divided into sub-intervals [0,300) and [300,400) due to the release of 
Ji ,2 at 300. In the first interval, the time demand includes only the execution 
times of Jip and J 2 ,i- The time demand of the second interval includes the 
execution time of Ji, 2 > as well as the work remaining from the first interval. The 
probability that a particular time demand occurs is conditioned on whether or 
not J 2 ,i completes before Ji ,2 is released. We first consider the interval [0,300). 
The probability that J 2 ,i will finish by 300 is 7^[w2,i(300) < 300] , where W 2 ,i{t) 
for 0 < t < 300 is computed via the sum Ei + E 2 and has the density function 
and distribution shown in Fig. 4. The probability that J 2 ,i completes by 300 is 
0 . 668 . 





(a) Density 



(b) Distribution 



Fig. 4. Time demand of J 2 ,i over interval [0, 300). 



We now compute 7^[w2,i(400) < 400 | W2,i(300) > 300] for t in the interval 
[300,400). Because J 2 ,i may not have completed by time 300, there are between 
0 and 198 time units of work remaining when Ji ^2 is released. The density func- 
tion for the backlog is the density function of Fig. 4(a) in the range 300-498, 
normalized to 1.0 as is implied by statistical conditioning. The random variable 
for the backlog is then added to the execution time of Ji^ 2 - The resulting density 
and distribution are given in Fig. 5. The probability that J 2 ,i completes by 400, 
given that it did not complete by 300, is 0.209 as shown in Fig. 5(b). 
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(a) Density (b) Distribution 

Fig. 5. Time demand of J 2 ,i over interval [300,400). 



Combining the results of analyzing the two sub-intervals gives us the distri- 
bution of the response time of J 2 ,i and thus the probability that J 2 ,i completes 
by 400 and meets its deadline 

■P[w2.i(400) < 400] = (0.668) -b (0.209)(0.332) = 0.738 . (5) 

The complete density and distribution functions of the response time of J 2 ,i 
over the interval [0,400) are given in Fig. 6. We note that the probability that 
J 2 ,i will not complete before r 2,2 is 0.262 so it is also necessary to compute the 
probability that J 2,2 completes by its deadline. The analysis proceeds following 
the same pattern until the busy interval ends. The probability that J ^,2 completes 
by its deadline at 800 is 0.994. The probability that J 2,3 completes by its deadline 
at 1200 is 1.000. Thus a lower bound on the probability that jobs in T 2 meet 
their deadlines is 0.738. 

We now return to the choice of initial phases for tasks. While we do not know 
what phasing causes a critical instance to occur, we hypothesize that the event 
occurs so infrequently that the average completion rate is not significantly af- 
fected. To test this hypothesis, we performed a series of simulation experiments 
on a number of systems. For each system, we determine the behavior of the sys- 
tem when each task Ti has a randomly distributed phase in the range (—Pi, Pi) 
and when all tasks have equal phases, i.e., are released at time 0. (We call a 
unique combination of phases and actual execution times of the tasks a run.) A 
large number of jobs in each task are released in each run. For each run, a his- 
togram of the response time of the jobs in each task is computed. The histograms 
of all the runs are averaged, bin by bin, to obtain a histogram representing the 
average behavior of the tasks of the system. The histograms for in-phase and 
random-phase releases are then compared. 

For the tasks in Example 2, we performed 1000 runs for both in-phase and 
random-phase releases, each run containing the release of at least 1000 jobs 
in each task. The width of the 95% confidence interval on the profile of the 
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(a) Density (b) Distribution 

Fig. 6. Time demand of J 2 ,i over interval [0,400). 



histogram was ±5% or less except in the tail of the density function where the 
probability was small to begin with. Figure 7 shows the histograms for task T 2 
from our example. 





0 200 400 600 800 1000 

Response Time 



(a) Density 



(b) Distribution 



Fig. 7. Average response times of T 2 . 



As Fig. 7(b) shows, the average response time distribution for in-phase re- 
leases bounds the distribution for random-phase releases from below. The av- 
erage response time density function. Figure 7(a), exhibits a curious saw-tooth 
behavior for in-phase releases. The behavior is caused by the fixed relationship 
between the release times of T\ and T 2 . This relationship causes the comple- 
tion of jobs in T 2 to be delay by jobs in T\ in a periodic manner. The linearly 
rising shape of each tooth is due to the uniform distribution of the execution 
time of T\ while the general shape of the curve results from combined effect 
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of the execution time distributions of both T\ and T^. Figure 8 compares the 
histograms for tasks with the same parameters as our example but with ex- 
ponential distribution times. Once again, the distribution with in-phase release 
bounds the distribution with random-phase release from below. Also, the in- 
phase release curve exhibits a similar saw-tooth shape. However, each tooth has 
a more rounded shape due to the exponential distribution of T\. Finally, the 
asymptotically decreasing shape of the density curves indicates the combined 
effect of the execution time distributions of both tasks. 





(a) Density (b) Distribution 

Fig. 8. Average response times of T 2 (Exponential). 



Despite the large number of systems simulated, we have not observed a case 
where tasks that are released with arbitrary phases have a lower average com- 
pletion rate than the same tasks that are released in-phase. We therefore use 
in-phase busy intervals in computing a lower bound on the average completion 
rate using STD A. 

We now compare the lower bound on the probability of meeting deadlines 
obtained via STD A with the percentage of deadlines met for each task in Table 2. 
The percentage of the jobs in each task meeting their deadlines was obtained 
by simulating the behavior of the system for 1000 runs. Each run released and 
executed 1333 jobs of T\ and 1000 jobs of T 2 to produce a response time dis- 
tribution (in the form of a histogram) for the tasks of the system. Once again, 
the response time distributions of the runs were averaged, bin by bin, to obtain 
average response time distributions for the tasks, as well as to assess statistical 
significance. The behavior of the system was observed when both tasks have 
identical phases, as well as when the phase of each task Ti is uniformly dis- 
tributed in the range {—Pi, Pi). As previously observed, the average completion 
rate for systems in which the tasks are in-phase was lower than the average 
completion rate for systems in which the tasks have random phases by a small 
but statistically significant amount at a 95% confidence level. (The simulation 
results shown below are for the case where the tasks are released in-phase.) 
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Table 3. A comparison of STDA bound with simulation results. 



Simulation 



Ti 


STDA 


In-phase 


Random- phase 


Ratio 


Ti 


100.0 


100.0 ±0.0 


100.0 ±0.0 


1.000 


T 2 


73.8 


80.8 ±0.1 


81.2 ±0.1 


0.913 



According to Table 3 the probability that jobs complete by their deadlines, as 
computed by STDA, bounds the percentage of deadlines met from below. The 
bound differs from the simulation results for T 2 by only 8.7%. The difference 
occurs because STDA computes the worst-case probability that jobs in the first 
busy interval meet their deadlines rather than the percentage of all jobs in the 
task that meet their deadlines. In this simple example, simulating the behavior 
of the two tasks is reasonable. However, for realistic systems with many tasks, 
simulation requires significantly greater effort than STDA. Hence STDA provides 
a faster way to determine if the probability of a missed deadline is acceptable. 

4 Implementing STDA 

In this section, we discuss an implementation of STDA in the PERTS real-time 
prototyping environment [6, 7]. PERTS is a tool which facilitates the design and 
analysis of real-time systems by applying theoretical results, where possible, or 
by simulating the system to determine its behavior. The issues we discuss are not 
particular to PERTS and must be addressed by any implementation of STDA. 

One of the main operations in STDA is the summing of random variables rep- 
resenting execution times. It is well known that the probability density function 
of the sum of two statistically independent random variables can be obtained by 
convolution f(t) = g(t) (g) h(t) .The direct way to perform convolution on a digital 
computer is to discretize the integral using a constant spacing between samples 
fi = 9 ihi-j. Computing / by direct convolution is an 0{N‘^) operation, 

where N is the number of points in the discrete representations of g and h. It 
has long been known that the asymptotic cost of convolution can be reduced 
by applying the Convolution Theorem g{t) 0 h{t) G{f)H{f), where G(/) 
and H{f) are the Fourier transforms of g{t) and h{t) respectively. The result 
is an 0{N log 2 N) algorithm for convolution. There are many descriptions and 
implementations of the EFT readily available (e.g., [2, 8, 9]). 

Three issues need to be considered when using EFT to perform convolu- 
tion. First, the discrete representations of the probability density functions being 
convolved must have the same sampling rate and consist of the same number 
of points. In our application, the vectors containing the discretized probability 
density functions will almost always have different sample rates and numbers of 
points as a result of the conditioning process. Thus new vectors must be created 
by interpolation before every convolution. Since interpolation can be performed 
in 0{N log 2 N) time, the asymptotic complexity of convolution is not increased. 
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Second, sufficient “zero padding” is required to ensure that aliasing does not 
occur [9]. The length of the vectors are also required to be a power of two. As a 
result, the vectors are likely to be large and sparsely populated in our applica- 
tion. Our experience indicates that the vectors are often only 50-75% filled with 
non-zero data. The final issue concerns the number of points used to represent 
the probability density functions for sufficient accuracy. 

Figure 9(a) shows the error between the computed and exact distributions of 
response time corresponding to Fig. 4(b) as a function of the number of points 
in the discrete representation. Figure 9(b) shows the computation time as a 
function of the number of points. In order to maintain acceptable interactive 
response, we have chosen a default of 1024 points in the PERTS implementation 
of STDA, which yields a maximum absolute error of slightly over 0.005 for this 
example. 




(a) Accuracy 



(b) Time 



Fig. 9. Convolution via FFT versus number of points. 



5 Conclusions and Future Work 

Using hard real-time analysis techniques to design soft real-time system can 
lead to low resource utilization, increased cost, and poor average performance. 
In this paper, we have presented the Stochastic Time Demand Analysis method 
for computing a lower bound on the percentage of jobs in a task that meet their 
deadlines under a fixed priority scheduling policy. The method enables missed 
deadlines to be balanced against other design goals such as processor utilization 
or cost. 

In addition to describing the STDA method, we have also performed a sim- 
ulation study to check the tightness of the bound. For the system used as an 
example, the bound has less than 10% error. While simulation of the example 
system of two tasks may not be much more complicated and time consuming than 
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STDA, the effort to bound the probability of missed deadlines is significantly less 
than required to simulate systems with many tasks. Hence STDA gives a faster 
way to determine whether the probability of missed deadlines is acceptable. We 
have implemented the STDA method in the PERTS environment. 

While STDA improves our ability to predict the behavior of soft real-time 
systems, it is restricted to fixed priority assignments. Similar techniques need 
to be developed for systems with dynamic priority assignments, such as those 
scheduled Earliest-Deadline-First. The probability that consecutive jobs will miss 
their deadlines also needs to be computed, as many soft real-time applications 
cannot afford to miss more than a certain number of deadlines in a row. Finally, 
the behavior of systems in which execution times are dependent, periods of jobs 
vary, jobs share resources, or jobs have precedence constraints between them 
needs to be considered. 
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Abstract. We consider the problem of computing concrete diagnostics 
for timed automata and reachability properties. Concrete means con- 
taining information both about the discrete state changes and the exact 
amount of time passing at each state. Our algorithm solves the prob- 
lem in 0(0 n^) time, where I is the length of the diagnostic run and n 
the number of clocks. A prototype implementation in the tool Kronos 
has been used to produce a counter-example in the claimed-to-be-correct 
version of the collision detection protocol of [HSLL97]. 



1 Introduction 

When checking a system against a property, a simple yes/no answer is often 
not satisfactory. The term diagnostics is used for any kind of supplementary 
information (for instance, states, executions or sets of these) which helps the 
user understand why verification fails or succeeds. Diagnostics are important for 
the following reasons: 

~ Without them no confidence in the system’s model can be gained. For in- 
stance, in case the property is not satisfied by the model, it might be that 
it is not the system which is wrong, but the modeling. 

— Even if the model is correct, the fault of the system cannot be easily located 
without any guidance. 

In the particular case of timed systems modeled as dense-time automata 
(TA) [ACD93,HNSY94], there is a need for timed diagnostics, that is, concrete 
runs in the TA semantics. These runs contain information both about the dis- 
crete state changes of the system, as well as the exact time delay between two 
discrete transitions. These delays can be essential to the understanding of a 
sample behavior of the system. 

Since TA model-checking is based on abstract models rather than the con- 
crete (i.e., semantic) one [DT98,BTY97], timed diagnostics cannot be generated 
directly. Until now, TA verification tools like Kronos [D0TY96,BTY97] and 
Uppaal [HSLL97] have been able to produce only abstract diagnostics, that is, 
sequences of the form Si ^ ^ Sk, where Si,..., Sk are sets of states and ^ 

is some abstract transition relation between these sets (usually corresponding 
to discrete steps). Then, all that is known is that some concrete execution ex- 
ists which corresponds to the abstract one. In particular, all information about 
delays between discrete steps is lost. 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 59-73, 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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In this paper we show how to compute timed diagnostics for TA with respect 
to reachability properties. Our technique is based on, first, finding an abstract 
execution sequence like the one above, and then extracting from it the concrete 
states and time delays. The complexity of our algorithm is 0{l ■ n^), where I is 
the length of the abstract sequence and n is the number of clocks of the TA. 

We have implemented our algorithm in the tool Kronos and used it to verify 
the case study presented in [HSLL97], The case study concerns an industrial 
protocol by Bang&Olufsen, aimed to ensure collision detection in a distributed 
audio/video environment. [HSLL97] present two versions of the protocol: the first 
one contains an error, claimed to be corrected in the second version. Surprisingly, 
we have found an error even in the “corrected” version of the protocol. Using our 
algorithm, we have obtained a timed counter-example showing how a collision 
can pass undetected. 

2 Background 

2.1 Clocks, bounds and polyhedra 

Let R be the set of non-negative reals and A = {xi, ..., x„} be a set of variables 
in R, called clocks. An X -valuation is a function v : A R. For some A C A, 
v[A := 0] is the valuation v', such that Va; € A . v'(a;) = 0 and Vx ^ A . v'(a:) = 
v(a;). For every S G R, v -|- 5 (resp. v — <5) is a valuation such that for all x € A, 
(v-l- <5)(x) = v(x) -I- S (resp. (v — <5)(x) = v(x) — S). Two valuations v and v' are 
called c-equivalent, for c G N, if for any clock x, either v(x) = v'(x) or v(x) > c 
and v'(x) > c. 

A bound [Dil89] over A is a constraint of the form of the form Xi ~ c or 

Xi — Xj ~ c, where I < i ^ j < n, ~G {<,<,>,>} and c G N U {oo}. If 

we introduce a “dummy” clock variable xq, taken to represent 0, then bounds 
can be uniformly written as Xi — Xj -< c, where 0 < i yfj < n, ^G {<,<} 
and c G Z U {oo}. For example, xi > 3 can be written as xq — xi < —3. A 

bound Xi — Xj -< c is stricter than Xi — Xj d iff either c < c' or c = c' 

and -<=<,<'=<■ (By convention, S < oo, for any real number S.) For instance, 
Xi — Xj <3 is stricter than Xj — Xj < 3, which is stricter than Xi — Xj < 4, 
and so on. For two bounds b and b', min(6, 6') (resp. max(6, 6')) is b (resp. b') if 
b is stricter than b' , b' (resp. h) otherwise. An A-valuation v satisfies a bound 
Xi — Xj ^ c iff v(xi) — v(xj) ^ c, where v(xo) is assumed to be 0. 

An X -polyhedron C is a conjunction of n • (n -I- 1) bounds: 

C — 

0<i^j<n 

We write C(bj) for the pair {^i,j,Ci,j)- We also write Cmax(C) kx inEix{|cijj | 
0 < i j < nj (|c| is the absolute value of c if c G Z, and 0 if c = oo). 

An A-polyhedron C defines a semantic entity, namely, the set of valuations 
satisfying all its bounds. If the bounds of ( are unsatisfiable, ( defines the empty 
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valuation set. More than one different df-polyhedra might define the same val- 
uation set, as is the case for(^i:a;<2Ay<2A0<a; — y<2 and 
(2 :x<2Ay<2A0<x — y<3. In the sequel, we consider only poly- 
hedra that are in canonical form, that is, where all constraints are as tight as 
possible This is the case for above, but not for ( 2 - 

Canonical form reduces semantic equality to syntactic equality, so that the 
valuation sets defined by f and f are equal iff C{i,j) and are identical, 

for all 0 < i, j < n. Other semantic operations on polyhedra, such as intersection 
C n C, inclusion f C , time successors /^C = {v|3<5gR.v — i5gC} and 
clock reset := 0] = {v[X := 0] | v G also have their correspondent 
syntactic transformations (the reader can see [Dil89,Yov93] for details of their 
implementation) . 

The above operations are used in section 3 for performing abstract reacha- 
bility analysis and computing timed diagnostics. An important operation in this 
analysis is the one guaranteeing termination, separately presented below. 



c-closure. Given an A-polyhedron C and a natural constant c, the c-closure of 
C, denoted close(^, c), is the greatest A-polyhedron D such that for all 
v' G C' there exists v G C such that v and v' are c-equi valent. Intuitively, is 
obtained by C by “ignoring” all bounds which involve constants greater than c. 
An example is shown in figure 1. 



y 



A 




y 

c ~ 



close(C, c) 




X 



Fig. 1. An example of c-closure. 



f is said to be c-closed if close(^, c) = f. 

Lemma 1. For any constant c, there is a finite number of c-closed X -polyhedra. 

The above result will be used in section 3.1 to prove termination of the reacha- 
bility algorithm. 

^ A canonical form can be computed for any non-empty polyhedron, as shown 
in [Dil89], using an 0(n®) algorithm, where n is the number of clocks. 
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2.2 Timed automata 

A timed automaton (TA) [ACD93,HNSY94] is a tuple A = {X,Q,E), where: 

— A is a finite set of clocks. 

— Q is a finite set of discrete states. 

— if is a finite set of edges of the form e = {q, X, q'), where q,q' G Q are the 
source and target discrete states, C is a conjunction of atomic constraints on 
A, called the guard of e, and A C A is a set of clocks to be reset. 

Given an edge e = (g, C, A, g'),we write guard(e) and reset(e) for ( and A, re- 
spectively. Given a discrete state q, we write out(g) for the set of edges of the 
form (g, _). Finally, we write Cmax(A) for the maximum of Cmax(C)> where C 

is a guard or an invariant of A. 

States. A state of A is a pair (g,v), where g G Q is a location, and v is an 
A-valuation. Two states (g, vi) and (g,V 2 ) are c-equivalent if vi and V 2 are 
c-equi valent. 

Transitions. Gonsider a state s = (g,v). We write s -I- <5 instead of (g,v -|- 5). 
A timed transition from s has the form s s -I- i5, where <5 G R. s -I- <5 is the 
6-successor of s. A discrete transition with respect to an edge e = (g, C,, A, g') 
such that V G C has the form s s', where s' = (g', v[reset(e) := 0]). s' is the 
e-successor of s. 

We write s > s' if, either i5 = 0 and s ^ s' is a discrete transition, or 
(5 > 0, and s ^ s" is a discrete transition and s" ^ s' is a timed transition. 

Lemma 2. Let A he a TA, c > Cmax{A) and si, S 2 be two c-equivalent states 
of A. Then, si -I- i5 and S 2 -I- i5 are c-equivalent for any (5 G R. Moreover, for any 
edge e, if si sj is a discrete transition, then S 2 s '2 is a discrete transition 
such that sj and s '2 are c-equivalent. 

Intuitively, the above lemma says that two states of A have essentially the same 
executions, if they agree on their clock values, except perhaps for those clocks 
which have grown greater than Cmax{A) The result will be used in section 3.1 
for proving correctness of the reachability algorithm. 

Runs. A run of A starting from state si and reaching state Sk is a sequence 
p = Si sj S 2 Sfe, such that si = s and for all z = 1, ..., k, s' is the 

Ci-successor of Si and Si+i is the (5i-successor of s'. We say that Sk is reachable 
from si. 

Reachability problem and diagnostic runs. The reachability problem for A is, 
given a set of initial states Si and a set of target states S, find whether there 
exists a run starting from some state si G and reaching some state s G S'. To 
provide timed diagnostics means to exhibit such a run. 

^ In fact, c-equivalence is a bisimulation. 
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3 Reachability with diagnostics 

Consider a TA A, and two sets Si and S, of initial and target states, respectively. 
We check whether Si can reach S and, if so, provide timed diagnostics. Our 
method of consists in two steps: 

— First, we apply the reachability algorithm of [DT98] to check whether there 

exists some run from Si to S. If reachability succeeds, the algorithm gener- 
ates an abstract path, that is, a sequence tt = Si ^ ^ S, where Si are 

sets of states, and are edges of A, for i = 1, ..., A:. 

— Then, we extract a run p = si S2 • • • s, such that p is inscribed 

in 7T, that is: (1) si € Si, (2) s — Sk,s e S, and (3) for each 1 < i < k, 

di, 



3.1 Checking reachability 

A zone of A is a set of states {( 9 ,v) | v G ^}, where ( is an A-polyhedron. For 
simplicity, we denote such a zone by (q,C)- 

Given a zone ( 9 , C)? edge e = {q,C,' , X,q') and a natural constant c, we 
define 

post(g, C, e, c) {q', close(/’ ((C C C') [^ := 0]) > c)) 

Notice that the result of post() is a zone, since polyhedra are closed with respect 
to the operations of intersection, clock reset, projection and c-closure. Also ob- 
serve that the operator is monotonic, that is, Ci C C2 implies post(g, Ci, e, c) C 
post(g,C2,e,c). 

The essential properties of post() are stated in the following lemma. 
Lemma 3. If (q',C) = Post((7, C, e, c), then: 

e S 

1 . For each v G C and each 6 G R, if (9, v) > {q',-v'), then v' G C,' . 

e S 

2 . For each v' G there exist <5 G R, v G C and v" G such that (q,v) —>—> 
(q' ,v") andw',w" are c- equivalent. 

Intuitively, post(( 7 , C, e, c) contains all successor states of {q,C), by ^ discrete e- 
transition and then a timed transition. Since the final result is closed under 
c-equi valence, some states might be added which are not direct successors, how- 
ever, they are c-equivalent to some direct successor. 

Based on the above lemma, we develop the algorithm of figure 2. The algo- 
rithm uses a depth-first search (DFS) to explore all successor zones of the initial 
zone (gi,^!). The search stops when either a zone is found which intersects the 
target zone (g, C) (line 1) or no more zones are left to be explored. 

Visit is the set of zones already visited, initially empty. Each new successor 
zone is inserted in Visit when the DFS procedure is called recursively (line 2). 

For each out-going edge of the current zone (gi, Ci)) DFS generates its succes- 
sor (line 3). Empty successors are ignored (line 4). The same is true with any 
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/* Precondition: c> msix{cmaxiA), Cmax(O) */ 

DFS ((?i,Ci),(<7,0) { 

if (gi = g A Cl n C / 0) then return “Yes” ; (1) 

Visit := Fisit U {(gi, Cl)} ; (2) 

for each (e € out(gi)) do 

(<?',C') := Post(?i,Ci,e,c) ; (3) 

if (C^ = 0) then continue ; (4) 



else if (3(g', C”) £ Visit . C' C C") then continue ; (5) 
else DFS ((?', C'), 0) ; 

end for each 

} 



Fig. 2. A DFS for Yes/No reachability. 



successor {q',C) which is contained in an already visited zone {q',C') 5): 

since post() is monotonic, all successors of {q' , (') are contained in (q', ("), thus, 
(q',C) does not have to be further explored. 

The algorithm generates pairs (q,(), where g is a discrete state and C is a 
c-closed polyhedron. By definition there is a finite number of discrete states and 
by lemma 1 there is a finite number of c-closed polyhedra, thus, the algorithm 
terminates. Correctness follows from lemmas 2 and 3. 

As presented, the algorithm of figure 2 only gives a yes/no answer to the 
reachability problem. It is easy to see how the abstract path reaching (q,() can 
also be returned: a DFS is usually implemented using a stack to keep the current 
sequence of zones and edges explored. This sequence corresponds exactly to the 
abstract path. In what follows, we show how to obtain more, namely, how to 
extract a run from the abstract path. 

3.2 Extracting runs from abstract paths 

Let 7T = (qi,(i) ^ be the abstract path returned by the 

DFS of figure 2, where for each i = (qi+i, Q+i) = post(qi, (i, £i, c) and 

Si+i = (q,Q+i), with n ( ^ 0. For simplicity, we assume that 0+i C 
(otherwise, we can just replace by C;+i n C). 

We show how to build a run inscribed in tt. The run is built in two passes, 
first backwards and then forwards: 

— Backward pass: initially we choose s/+i £ (g/+i,Ci+i) and then successively 

e- (5- 

compute Si £ R, Si € (qi,(i), for i = I, 1, such that Si for some 

s'_|_i which is c-equi valent to s^+i. 

— Forward pass: starting from si £ (gi, Ci)) we compute s', for i = 2, ..., I + 1, 
based on Si, e^. The final run is si ^2 ' ' ' 

Intuitively, the backward pass generates an invalid run which might contain 
some “jumps” among c-equivalent states. The forward pass corrects the run by 
“adjusting” the clocks which have grown greater than Cmax{A). 
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Before describing the two passes in detail, we show how choosing a state 
in a zone {q, can be done effectively. In fact, this comes down to extracting 
a valuation v S C- the sequel, we assume that the set of clocks is X = 

{xi, 

Extracting valuations from polyhedra. An k -incomplete valuation is a valuation 
V on {xi, ...,Xk}- We say that v can be completed in C if there exists an X- 
valuation v' G such that '^'{xj) = ~v{xj), for all j < k. Completing v in ^ 
means finding such a v'. Notice that we permit fc = 0 , so that completing a 
0 -incomplete valuation in C means extracting a valuation from C. 

Lemma 4 . Let C. be an X -polyhedron and v he an k-incomplete valuation. It 
takes 0{n^) time to eomplete v in C, or find that this is not possible. 

Proof. Let C = Ao<t^j<n ~ For i = 0 , ..., k, we define: 

Jo, if z = 0 

* v(a;i), if 1 < z < fc 

Then, for z = A: + 1 , n, we choose Si such that: 

VO ^ J ^ Z . ^j,i ^j,i bi ^i,j 

If such a Si cannot be chosen for some z, then v cannot be completed. Otherwise, 
we let v'{xi) = Si, for z = 1 , ..., n. It is easy to see that v' G (. 

Regarding complexity, in the worst case we have z = 0 , meaning that we have 
to perform n-(zz — l) + n comparisons and additions of bounds. 

Backward pass. It suffices to show how the computation is done for a single step, 
say, (gi,Ci) (92, C2)- That is, given V2 G C2, we shall show how to compute 

i 5 G R and vi G Ci such that (gi,vi) ((72, V2), and V2,V2 are c-equi valent . 

Finding S can be done by “pulling V2 backward in time”, until some clock 
reset in e reaches 0 . More precisely, if reset(e) = 0 then we let <5 = 0 , otherwise 
we let S = V2(x), for some x G reset(e). 

Now, let V3 = V2 — S. By definition, we have V3 G (2 and (92, V3) -A (q2,V2). 
It remains to find vi G Ci such that (gi,vi) (<72, V4) and V4 and V3 are 
c-equi valent, which implies that V4 -|- <5 and V2 are also c-equi valent . 

Without loss of generality, we assume that there exists 0 < k < n such that 
the clocks xi, ..., Xk are not reset in e and for each j = 1, ..., k, \'2{xj) < c. 

First, Vi should satisfy guard(e). Moreover, since clocks x\,...,Xk are not 
reset in e, they should have the same value in vi and V3. Then, we let v be a 
/c-incomplete valuation, such that v(xi) = V3(a:i), for z = 1 , ..., fc. Using lemma 4 , 
we can complete v in Ci n guard(e). This is always possible, by the second part 
of lemma 3 . 

Therefore, we define Vi to be the completed valuation. If we let V4 = Vi [reset(e) 
0], we have: 
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- for i = 1, k, V4(xi) = 

— for i = k + n, 

• if Xi € reset(e), then V4(xi) = V3(xi) = 0, 

• otherwise, V4(a;i) > c and vs^Xi) > c. 



That is, V4 and V3 are c-equi valent . 

Regarding the complexity of the backward pass, observe that for each step, it 
takes 0 {n) time to find the delay S and O(n^) time to complete the valuation 
Therefore, the whole pass can be performed in time 0 {l ■ n^). 



Forward pass. This pass is easy. We start from si = (<7i,vi), as computed in 
the backward pass. Then, for i = 1, ..., / + 1, we compute v- by “adjusting” 
as follows. 

- vj = vi; 

- for t = 2, ...,l+ 1, v' = [reset(ej) := 0]) + 5 i. 

Using lemma 2 and induction on it is easy to prove that the resulting run is 

e- (5- 

valid, that is, (<7i,v') WW ((7^+1, for all i = 1 ,...,/. 

The complexity of the forward pass is 0 {l ■ n). Therefore, the complexity of 
computing the whole run is 0(1 ■ ri^). 



Example. Consider the simple TA shown in figure 3. We are interested in reach- 
ability of the target zone (53, true) from the initial zone {q\,x = y). Let e\ be the 
edge from q\ to (72 and 62 the edge from 52 to 53. The algorithm of figure 2 suc- 
ceeds, returning the abstract path {qi,x = y) ^ {q2,y = x+2) 3- (53, y > x+2). 
Notice that for this example c = 2 and before applying close(), the polyhedron 
associated to (73 is 77 = a; -I- 4. 

For the backward pass, we start by choosing V3 S 77 > a; -I- 2 , say, V3 = (a; = 
0,77 = 3). This gives ^3 = 0. Then, we must complete a 0-incomplete valuation 
in77 = a;-|- 2 Aa: = 2 , which gives us V2 = (a; = 2 , 77 = 4). Since x is reset 
in Cl, we get 82 = 2 . Finally, we have to complete a 0-incomplete valuation in 
y = X A x = 2 , which gives us Vi = (a: = 2 , 77 = 2 ). At the end of the backward 
pass, we have the sequence (<7i,a; = 2,77 = 2) {q2,x = 0,77 = 2) ^ {q2,x = 

2, 77 = 4) 3- (g3, a; = 0, 77 = 3). This is not a valid run, since there is a “jump” of 
clock 77 on the e2-transition. 

The forward pass adjusts V 3 to Vg = (a; = 0 , 77 = 4), yielding the final (valid) 
run: (gi, a; = 2 , 77 = 2 ) {q2,x = 0,77 = 2 ) A {q2,x = 2,77 = 4) {qs,x = 

0,77 = 4) {q3,x = 0,77 = 4). 

® Completing a valuation in the intersection of more than one polyhedra, say, C C 
• • • n multiplies the complexity of the operation by only a constant factor m. 
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the zone at qi the zone at Q2 the zone at 

Fig. 3. A simple example. 



4 Case study: Bang&Olufsen’s Collision-Detection 
Protocol 



We have implemented the technique presented in the previous section in the 
real-time verification tool Kronos, which can now provide timed diagnostics 
when reachability analysis succeeds 

We have used Kronos to verify the industrial Bang&Olufsen protocol, 
treated with Uppaal in [HSLL97]. The TA models of Kronos and Uppaal are 
essentially the same, so that translating the specification of [HSLL97] to Kronos 
format was almost straightforward. The protocol is only briefly presented here; 
the reader is referred to the above paper for more details. 

Brief description and modeling. The role of the protocol is to ensure collision de- 
tection in a distributed environment of components exchanging messages through 
a common multiple-access bus. The system modeled has two transmission com- 
ponents A and B (identical up to renaming) and the bus. Since we are interested 
only in the collision-detection protocol, the reception components are not mod- 
eled. A and B consist each of 3 sub-components, namely, the sender, the detector 
and the frame generator. The sender handles transmission of messages, which 
are grouped in frames. The latter are generated by the frame generator. The 
detector is responsible for collision detection. 

The components along with their communication channels are shown in fig- 
ure 4. Each component is modeled as an automaton. The two senders are modeled 

^ The implementation is actually compatible with the new features of Kronos, in- 
cluding discrete variables (of type boolean, bounded integer or enumerative) , mes- 
sage passing, and a variable-dimension DBM library which exploits the activity of 
clocks [DT98] to reduce the size of the state space. 
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Fig. 4. Bang&Olufsen’s protocol: general architecture. 



by timed automata whereas the rest of the automata are untimed Figure 5 
shows the TA for sender A. The figure is merely intended to give an impression of 
the complexity of the case study and the modeling issues involved. In particular, 
Uppaal uses so-called “committed locations”, which are not a standard feature 
of Kronos. However, they can be easily modeled as described in appendix A. 

The most interesting feature of the protocol is its timing constraints, which 
concern the frequency of senders’ polling on the bus, the encoding of messages 
and the waiting delay required before retransmitting after a collision. For in- 
stance, a sender samples the value of the bus (1 for high voltage, 0 for low volt- 
age) twice every 781 micro-seconds. Also, there are 5 different types of messages 
and the i-th message is encoded by the presence of a 1 on the bus, for 2 • 1562 • i 
micro-seconds. Finally, the jamming signal, after a collision, is a continuous 1 
on the bus for 25 milli-seconds 

Verification. The protocol must ensure collision detection, that is, if a frame 
sent by a sender is destroyed by the other sender (collision), then both senders 
shall detect this. According to [HSLL97], collision happens when the boolean 
expression 



-(A.p/ A.Si A A.Pn 4A A.S 2 ) 

® The observer automaton shown in the figure is not part of the system itself, but is 
added to mouitor the system for possible errors, as we explain below. 

® Notice that duration constants vary from 40 micro-seconds to 0.5 seconds and have 
no common divisor (look at figure 5). This implies a very small time quantum, 
namely, one micro-sec, which results in very large constants in guards and invariants. 
Consequently, enumerative approaches based on discretization are not well-suited 
for this case study, since time units have to be counted one-by-one, leading to state 
explosion. 
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Fig. 5. Bang&Olufsen’s example: the TA for sender A. 
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evaluates to false at the moment A.S 2 is assigned (transition from control state 
11 to 12 in figure 5). A collision is detected when the result of the detector 
automaton (called by signal “Acheck !”) is Ajres = 1 or A.res = 2, whereupon 
the sender emits an “Areset !” signal. 

Now we can model the requirement in terms or reachability of the “error” 
state of the observer automaton shown in figure 6. The observer starts at its left- 
most state and moves to its middle state when a collision happens. If the collision 
is detected before the the sender finishes transmitting (modeled by signal “Done 
!”) then the observer returns to its initial state, otherwise it goes to the error 
state. 



Observe ? 




Fig. 6. Bang&Olufsen’s example: the observer automaton. 



Results and performance. [HSLL97] present two versions of the protocol: the 
initial version contains an error (Uppaal provides an abstract counter-example); 
then, the frame-generator automaton is slightly modified and the authors of 
[HSLL97] claim this version to be correct. However, we have found a counter- 
example in both versions. The counter-example of the “corrected” version is 
generated by Kronos in 25 seconds on a Sparc 20 

The complete diagnostic run is 1951 discrete/timed steps long. Here, we show 
only its head and its tail, as produced by the tool ®: 



<> - 0 - <> 

<y:0> - 40 - <y:40> 
<x:0, y:40> - 741 - 
<x:741, y:781> - 40 
<x:781, y:40> - 0 - 
<x:781, y:40> - 2303 





” y: 


=0 


"b_go" — > 




— X : 


=0 


A 

1 

1 

0 

bO 

1 


<x:741, y:781> 


” y: 


=0 


"b_start_frELme" — > 


- <x:781, y:821> 


— "a 


_silent" — > 


<x:781, y:40> 


— X : 


=0 


"a_start_frELme" — > 



<x:3084, y:2343> 



"b_silent" 



^ As in [HSLL97], in order to obtain a fast answer, we have used a simplified model 
where not the whole variety of messages could be generated. 

® There are too many discrete variables, thus, only the clock valuation is shown for 
each state. Clocks x and y correspond to senders A and B, respectively. The initial 
valuation is trivial since no clocks are initially active. In the second valuation, only 
y is active. 
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<x;2303 

<x;2303 

<x;2343 

<x;741, 

<x;781, 



, y;2343> - 0 - <x:2303, y:2343> — y:=0 "b_one" — > 
, y;2343> - 40 - <x;2343, y:2383> — x:=0 "a_one" — > 
, y;40> - 741 - <x:3084, y:781> — y:=0 "b_one" — > 
y:781> - 40 - <x:781, y;821> — x:=0 "a_one" — > 

y:40> - 741 - <x:1522, y:781> — y:=0 "b_frame" — > 



<x:741 


y:781> - 40 


- <x:781, y:821> 


<x:781 


y:40> - 0 - 


<x:781, y:40> 


<x:781 


y:40> - 0 - 


<x:781, y:40> 


<x:781 


y:40> - 741 


- <x:1522, y:781> 


<x:741 


y:781> - 0 


- <x:741, y:781> 


<x:741 


y:781> - 0 


- <x:741, y:781> 


<x:741 


y:781> - 40 


- <x:781, y:821> 


<x:781 


y:40> - 0 - 


<x:781, y:40> 


<x:781 


y:40> - 0 - 


<x:781, y:40> 


<x:781 


y:40> - 0 - 


<x:781, y:40> 


<x:781 


y:40> - 40 


- <x:821, y:80> 


<x:40, 


y:80> - 0 - 


<x:40, y:80> 


<x:40, 


y:80> - 0 - 


<x:40, y:80> 



— "b_observe_ok" — > 

— "b_stopped" — > 

— x:=0 "a_zero" — > 

— "a_diff_pf_sl" — > 

— "a_stopped" — > 

— y:=0 "b_nocol" — > 

— "b_pf0" — > 

— "b_zero" — > 

— "b_new_pn" — > 

— x:=0 "a_nocol" — > 

— "a_pf0" — > 

— "a_zero" — > 

— "a_new_pn" — > 



Intuitively, the error seems to be due to the following reasons: 

1. The two senders start transmitting with a difference of exactly 40 /x-seconds. 
Due to this fact and the way the sampling of the bus is performed, collision 
remains undetected until the last message of the frame is sent. 

2. In the last message of the frame (a message signaling end- of -frame) the 
collision detection procedure is disarmed. This can be seen in the tail of 
the diagnostic run above: instead of the action a_check calling the collision 
detection procedure, we see the action a_stopped, which means that boolean 
variable A.stop is set. Therefore, collision is not detected by A. The situation 
is the same for sender B. 



5 Conclusions 

We have shown how to compute exact diagnostics for timed systems with re- 
spect to reachability properties. Our technique has enhanced the verification 
tool Kronos with a useful feature, which makes debugging of the model and 
discovery of real system flaws much easier. 

Related work. Timed diagnostics have been considered independently in [LPY95]. 
However, only the existence of a run inscribed in a symbolic path is stated and 
no method is given on how to actually extract the run. Moreover, the symbolic 
reachability of [LPY95] does not contain the c-closure operation. This makes the 
extraction of runs simpler, but without c-closure termination is not generally 
ensured. 
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Recently, [AKV98] have developed an algorithm which, given a sequence of 
edges, produces a corresponding run, if one exists. This algorithm has complexity 
0{l ■ V?) as ours, and can also be used to extract a timed diagnostic from a 
symbolic path. 

Acknowledgments. This work would not have been possible without the help 
of Marius Bozga for extending Kronos without discrete variables. I would like 
also to thank him for his help in understanding the counter-example trace in 
Bang&Olufsens protocol. 
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A Modeling “committed locations” 

Committed locations are simply discrete states modeling atomic execution. In- 
formally, the semantics are as follows, for a network of TA. When an automaton 
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A enters a committed location, it has to exit immediately, and no other automa- 
ton takes a discrete step meanwhile. 

To model committed locations, we introduce a global boolean variable atom 
and a global clock z (a single boolean variable and a single clock suffice, no matter 
how many the committed locations are). The invariant that must hold during 
execution is that atom is set iff some automaton is in a committed location and 
that the time spend in committed locations is zero. 

For each TA A in the global system, if e = (g, X, q') is an edge of A, then: 

— If g is not committed, then we add the boolean guard ^atom to e. 

— If g is committed, then we add the clock guard z = 0 to e. 

~ If g' is committed, then we add the assignment atom := true and the clock 
reset z := 0 to e. 

— If q' is not committed, then we add the assignment atom := false to e. 

The construction is illustrated in figure 7. 



o 

A 

committed: O 






o 



o 



o 

-^atom 




z = 0 



atom := false 




o 



-^atom 



o 



Fig. 7. Modeling committed locations with an auxiliary boolean variable and 
clock. 
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Abstract. The i- protocol, an optimized sliding- window protocol for GNU 
UUCP, came to our attention two years ago when we used the Con- 
currency Factory’s local model checker to detect, locate, and correct a 
non-trivial livelock in version 1.04 of the protocol. Since then, we have 
repeated this verification effort with five widely used model checkers, 
namely, COSPAN, Mury>, SMV, Spin, and XMC. It is our contention 
that the i-protocol makes for a particularly compelling case study in 
protocol verification and for a formidable benchmark of verification-tool 
performance, for the following reasons: 1) The i-protocol can be used to 
gauge a tool’s ability to detect and diagnose livelock errors. 2) The size of 
the i-protocol’s state space grows exponentially in the window size, and 
the entirety of this state space must be searched to verify that the pro- 
tocol, with the livelock error eliminated, is deadlock- or livelock-free. 3) 
The i-protocol is an asynchronous, low-level software system equipped 
with a number of optimizations aimed at minimizing control-message 
and retransmission overhead. It lacks the regular structure that is often 
present in hardware designs. In this sense, it provides any verification 
tool with a vigorous test of its analysis capabilities. 



1 Introduction 

Model checking [CGP99] is a verification technique aimed at determining whether 
a system specification possesses a property expressed as a temporal logic formula. 
Model checking has enjoyed wide success in verifying, or finding design errors in 
real-life systems. An interesting account of a number of these success stories can 
be found in [CW96]. 

In this paper, we report on our experience in using model checking — as 
provided by six widely used verification tools — to detect and correct a non- 
trivial livelock in a bidirectional sliding- window protocol. The tools in ques- 
tion are the Concurrency Factory [CLSS96], COSPAN [HHK96], Mur:/? [Dil96], 
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SMV [CMCHG96], Spin [HP96], and XMC [RRR+97], each of which supports 
some variety of model checking. 

The protocol that we investigate, the i-protocol, is part of the GNU UUGP 
package, available from the Free Software Foundation, and is used for file trans- 
fers over serial lines. The i-protocol is part of a protocol stack; its purpose is to 
ensure ordered reliable duplex communication between two sites. At its lower 
interface, the i-protocol assumes unreliable (lossy) packet-based FIFO connec- 
tivity. To its upper interface, it provides reliable packet-based FIFO service. 
A distinguishing feature of the i-protocol is the rather sophisticated manner in 
which it attempts to minimize control- message and retransmission overhead. The 
GNU UUGP package also contains the g- and j-protocols, which are variants of 
the i-protocol. 

A problem with the i-protocol, GNU UUGP version 1.04, was first noticed by 
Stark, while trying to transfer large files from a remote computer to his home PG 
over a modem line. In particular, it appeared that, under certain message-loss 
conditions, the protocol would enter a “confused” state and eventually drop the 
connection. In order to diagnose this problem, we extracted an abstract version 
of the i-protocol from its source code, consisting of approximately 1500 lines of 
G code. We formalized this abstraction of the protocol in VPL (Value Passing 
Language), the input language of the Goncurrency Factory specification and 
verification toolset. 

The VPL source of the i-protocol was then subjected to a series of model 
checking experiments using the Goncurrency Factory’s local model checker for 
the modal mu-calculus [RS97]. This led us to the root of the problem: a livelock 
that occurs when a particular series of message losses drives the protocol into 
a state where the communicating parties enter into a cycle of fruitless message 
exchanges without any packets being delivered to the upper layer entities. Seeing 
no progress, the two sides close the connection, which must then be reestablished. 
If the communication line is sufficiently noisy, or if one of the sides is slow 
in emptying communication buffers, say due to disk waits, leading to buffer 
overflows, the chances of this scenario recurring are high, and can result in 
extremely poor performance. 

Using the Goncurrency Factory’s diagnostic facility, we were able to pinpoint 
and subsequently “patch” the bug in the VPL code. The fix to the protocol 
consists of a simple change in the way negative acknowledgments are handled. 
The livelock error was fixed independently by Ian Taylor, the i-protocol’s original 
developer, in GNU UUGP version 1.05. 

We repeated our model-checking-based verification of the i-protocol with the 
COSPAN, Mur(p, Spin, SMV, and XMG verification tools, so that we could draw 
some comparisons between these tools on a real-life protocol. The i-protocol 
is particularly compelling as a case study in protocol verification and as a 
verification-tool performance benchmark for several reasons. First, the version 
we originally model checked has a bug, i.e. the livelock error, and hence the 
protocol can be used to gauge a tool’s ability to uncover errors of this nature. In 
this case, we are more interested in debugging or refutation than in verification. 
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Secondly, the size of the i-protocol’s state space grows exponentially in the 
window size, and the entirety of this state space will need to be searched to verify 
that the protocol, with the livelock error eliminated, is deadlock- or live lock-free. 
Finally, the i-protocol is an asynchronous, low-level software system equipped 
with a number of optimizations aimed at minimizing control-message and re- 
transmission overhead. It lacks the regular structure that is often present in 
hardware designs. In this sense, it provides any verification tool with a vigorous 
test of its analysis capabilities. 

Our experimental results show that the special-purpose cycle-detection al- 
gorithms of Spin and COSPAN can be used to significant advantage to check 
for livelocks in complex systems like the i-protocol. SMV exhibited excellent 
memory-usage performance on all runs of window size 1, but failed to complete 
in a reasonable amount of time on any run of window size 2. This can most likely 
be attributed to exponential blowup in the BDD representation for window sizes 
greater than 1. Mur(p and XMC performed the best on the i-protocol. In the case 
of Mur(p this is due to the low-level nature of its specification language (guarded 
commands) and the succinct manner in which system states are encoded. XMC’s 
strong performance is a consequence of the efficiency of the underlying tabled 
logic programming system, XSB [XSB97], and our use of partial evaluation to 
specialize the logical formula capturing livelock to the i-protocol’s behavior. Our 
model-checking results are described more fully in Section 5 (see Table 1). 

In related work, [CCA96,Cor96] benchmark the performance of a variety 
of model checkers (including SMV and Spin) on Ada tasking programs. The 
major differences between our study and theirs is in the application domain (a 
real-life communication protocol vs. a suite of concurrency analysis benchmark 
programs) and in the type of properties considered (livelock vs. reachability). 

The remainder of the paper is organized as follows. Section 2 describes the 
salient features of the tools used in this case study. Section 3 gives a detailed 
account of the i-protocol, with an emphasis on how we modeled the protocol for 
verification purposes. Section 4 describes the livelock that we discovered, and 
shows how a small change to the protocol effectively eliminates this form of live- 
lock. Section 5 summarizes the results of our model-checking experiments, and 
offers a comparison of the tools’ performance. Section 6 contains our concluding 
remarks. 

We have constructed a web site (http : //www. cs . sunysb . edu/~lmc/ iproto/) 
to serve as a central repository for our results. The site contains the source code 
of version 1.04 of the i-protocol, the patch to the C code to fix the livelock er- 
ror, the encoding of the protocol in each of the input languages of the six tools, 
and various performance statistics generated by our benchmarking activity. For 
each tool, these include the number of states explored, number of transitions 
traversed, CPU time usage, and memory usage (see Table 1). 
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2 The Verification Tools 

In this section, we describe the most salient features of the tools we used in our 
analysis of the i-protocol. 

2.1 The Concurrency Factory 

In the context of our case study, the main features of the Concurrency Fac- 
tory [CLSS96] are its textual specification language, VPL, and its local model 
checker for the modal mu-calculus [RS97]. VPL-supported data structures in- 
clude integers of limited size and arrays and records composed of such integers. 
A system specification in VPL is a tree-like hierarchy of subsystems. A subsystem 
is either a network or a process. A network consists of a collection of subsystems 
running in parallel and communicating with each other through typed channels. 
Simple statements of VPL are assignments of arithmetic or boolean expressions 
to variables, and input/output operations on channels. Complex statements in- 
clude sequential composition, if-then-else, while-do, and nondeterministic 
choice in the form of the select statement. 

LMC, the Factory’s local model checker, computes in an on-the-fiy fashion 
the product of a graph representation of the formula to be checked with the 
labeled transition system (guaranteed to be finite-state) underlying the VPL 
program. The number of nodes of the product graph explored by LMC is further 
minimized through the use of partial-order reduction. This technique eliminates 
from consideration those portions of the state space resulting from redundant in- 
terleavings of independent events. LMC is also equipped with diagnostic facilities 
that allows the user to request that the contents of the depth-first search stack 
be displayed whenever a certain “significant event” occurs (e.g. when the search 
first encounters a state at which a logical variable is determined to be either true 
or false) and to play interactive games for the full modal mu-calculus. 

2.2 COSPAN 

COSPAN [HHK96] is a model checker for synchronous systems based on the 
theory of w-automata. The system to be verified is specified as an w-automaton 
P, the task the system is intended to perform is specified as an w-automaton 
T, and verification consists of the automata language containment test C{P) C 
C(T). P is typically given as the synchronous parallel composition of component 
processes, specified as w-automata. Asynchronous composition can be modeled 
through nondeterministic delay in the components. 

Language containment can be checked in COSPAN using either a symbolic 
(BDD-based) algorithm or an explicit state-enumeration algorithm. Both algo- 
rithms are “on-the-fiy.” COSPAN also supports a notion of “recur edge” and can 
check whether in every execution of the system the recur edge occurs infinitely 
often. We used this facility to detect livelock in the i-protocol. 

Systems can be specified in COSPAN using the S/R language, which sup- 
ports nondeterministic, conditional (i.e., if-then-else) variable assignments; 
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variables of type bounded integer, enumerated, boolean, and pointer; arrays 
and records; and integer and bit- vector arithmetic. Modular hierarchy, scop- 
ing, parallel and sequential execution, homomorphism declaration and general 
w-automaton fairness are also available. COSPAN also provides an error tracing 
facility that allows the user to back-reference from the error track to the S/R 
source. 



2.3 Mur<^ 

The Mur(p verification system consists of the Murt^ compiler and the Mur(p 
description language. The Murt^ compiler generates a special-purpose verifier 
from a Mur(p description. The Murtp description language uses a set of iter- 
ated guarded commands, like Chandy and Misra’s Unity language [CM88]. A 
Murtp description consists of constant and type declarations, variable declara- 
tions, procedure declarations, rule definitions, a description of the start state, 
and a collection of invariants. Each rule is a guarded command consisting of a 
condition and an action. The condition is a boolean expression and the action 
is a sequence of statements. An invariant is a boolean expression that is desired 
to be true in every state. When an invariant is violated, an error message and 
error trace are generated. 

Mur(p is able to verify liveness specifications written in a subset of Linear 
Time Temporal Logic (LTL). Liveness specifications are expressed using key- 
words ALWAYS, EVENTUALLY, and UNTIL, and are checked under the assumption 
that every rule is weak-fair (unless declared otherwise). We used this facility of 
Mur(^ to encode and check for live lock in the i-protocol. 

2.4 SMV 

SMV [CMCHG96] is an automatic tool for model checking CTL formulas. CTL 
can also be used to specify simple fairness constraints. The transition relation 
of the system to be verified is represented implicitly by boolean formulas, and 
implemented by BDDs. This allows SMV to verify models having more than 
10^° states. SMV also has a diagnostic facility that produces a counterexample 
when a formula is not true. 

An SMV program can be viewed as a system of simultaneous equations whose 
solution determines the next state. Asynchronous systems, such as the i-protocol, 
are modeled by defining a set of parallel processes whose actions are interleaved 
arbitrarily in the execution of the program. As in Murt^ liveness specifications, 
such as absence of livelock, are given in a form of temporal logic (CTL). 

2.5 Spin 

Spin [HP96] is a model checker for asynchronous systems specified in the lan- 
guage Promela. Safety and liveness properties are formulated using LTL. Model 
checking is performed on-the-fiy and with partial-order reduction, if specified by 
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the user. Moreover, model checking can be done in a conventional exhaustive 
manner, or, when this proves to be impossible due to state explosion, with an 
efficient approximation method based on bitstate hashing. With a careful choice 
of hashing functions, the probability of an exhaustive proof remains very high. 

Besides being able to specify correctness properties in LTL, the Promela 
specification language includes two types of labels that can be used to define two 
complementary types of liveness properties: acceptance and progress. We used 
Spin’s ability to check for this latter type of formula to detect livelock in the 
i-protocol. 

Promela is a nondeterministic guarded-command language with influences 
from Hoare’s CSP and the language C. Promela includes support for data struc- 
tures, interrupts, bracketing of code sections for atomic execution, the dynamic 
creation of concurrent processes, and a variety of synchronous and asynchronous 
message passing primitives. Message passing is via channels with arbitrary num- 
bers of message parameters. 

2.6 XMC 

XMC [RRR+97] is a model checker for a value-passing process calculus and the 
modal mu-calculus. It is written in under 200 lines of XSB tabled Prolog code. 
XSB [XSB97] is a logic programming system developed at SUNY Stony Brook 
that extends Prolog-style SLD resolution with tabled resolution. The principal 
merits of this extension are that XSB terminates on programs having finite mod- 
els, avoids redundant subcomputations, and computes the well-founded model 
of normal logic programs. 

Systems to be verified in XMC are encoded in the XL language, a value- 
passing language similar in many ways to Milner’s CCS. A distinguishing fea- 
ture of XL is its support for a generalized process prefix operator, which allows 
arbitrary Prolog terms to appear as prefixes. This construct allows the XL pro- 
grammer to take advantage of XSB’s substantial data-structuring facilities to 
describe sequential computation on values. 

Properties such as the possibility of livelock are expressed as modal mu- 
calculus formulas. The encoding of the semantics of the mu-calculus in XMC 
can be specialized [JGS93] with respect to a given formula. For the livelock 
formula used in the verification of the i-protocol, specialization yields a logic 
program that implements an efficient cycle-detection algorithm, and leads to 
improved performance. 



3 Modeling the i-Protocol 

In this section we introduce the i-protocol, and describe how we modeled it for 
verification purposes. 

The i-protocol is a sliding window protocol, but with some optimizations, 
to be described later, aimed at reducing the acknowledgment and retransmis- 
sion traffic. The window size, among other “steady-state” protocol parameters. 
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such as data packet size, line quality and error handling parameters, timeout 
values, acknowledgment high watermarks, and data and message buffer sizes, is 
decided at the parameter negotiation stage during connection set-up. Since we 
are concerned with the data transfer properties of the protocol, we do not model 
the stages involved in connection set-up, parameter negotiation, error and line- 
quality monitoring, and connection shutdown. In particular, the window size for 
our model is a parameter that is fixed at “compile time.” 

The protocol is intended to provide reliable, full duplex, FIFO service to 
its upper interface, given a full duplex, unreliable, FIFO packet-based commu- 
nication service by its lower interface. It is convenient to imagine each side as 
consisting of two halves — a sender half that sends data packets to, and re- 
ceives acknowledgments from, the receiver half on the other side, and a receiver 
half that receives data packets from, and sends acknowledgments to, the sender 
half on the other side. To allow for communication latency, the sender can send 
several packets without waiting for acknowledgments. If the window size is W, 
then the sender can have up to W contiguous packets unacknowledged at any 
time. These packets are stamped with sequence numbers when received from the 
upper layer; sequence numbers range from 0 to SEQ — 1. 

The i-protocol, as implemented in GNU UUCP, uses a fixed value of SEQ = 
32, and is intended for window sizes up to, but not exceeding, 16. As discussed 
below, however, this bound is not essential, and using a sequence space of SEQ, 
a window size of up to [SEQ/2\ can be supported. 

To cut down on the acknowledgment traffic, the receiver can piggyback its 
acknowledgments on top of normal data, or other control traffic. When both sides 
are exchanging data packets, this is often sufficient to keep the connection going 
without the need for explicit acknowledgments. However, when a side is only 
receiving data, it needs to send explicit ACKs. In this case, as an optimization, 
ACKs are sent only at half-window boundaries, i.e., one for every [IU/2] packets 
received. 

Below we give a more detailed account of the i-protocol. The interested reader 
may also refer to the (VPL-style) pseudo-codes on the web site. 

The “sender half” uses the following main state variables, each of which 
ranges over SEQ. A variable sendseq is used to stamp the next user-level message 
from the upper layer. Its value gives the upper edge (exclusive) of the sender’s 
“active window.” The variable rack is used to keep track of acknowledgments 
from the remote, and its value gives the lower edge (exclusive) of the sender’s 
active window. At our level of abstraction, the data contents of a packet are not 
modeled, and so the sender does not explicitly buffer unsent messages^. 

The main data structures used by the receiver half are as follows. A variable 
recseq is used to record the sequence number up to, and including which, all 
packets have been successfully received from the remote, and delivered to the 
upper layer. The variable lack records the sequence number up to which an 
acknowledgment, either explicit (via an ACk) or implicit (via a piggybacked 
acknowledgment in a data or NAK packet), has been most recently sent to the 

^ This is a data independence abstraction [Wol86]. 
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remote. The receiver’s active window consists of the sequence numbers from 
lack+ 1 through lack+ W (modulo SEQ)? A boolean array recbuf of size SEQ 
indicates the sequence numbers in this window that have been received (out of 
order) and are being buffered for returning to the upper layer. This buffering 
is required in order to deliver packets in the correct order to the upper layer. 
Another boolean array nakd is used to remember the sequence numbers that 
have recently been negatively acknowledged. As in the case of the sender, the 
receiver does not explicitly buffer packets, recording only whether a message has 
been received from the remote, but not yet delivered to the upper layer. 

The protocol initialization code sets lack, rack and recseq to 0, sendseq to 
1, and all entries in the arrays nakd and recbuf to false. The protocol’s main 
loop consists of busy waiting for one of the following events to occur, and taking 
appropriate actions as described: 

(El): a packet arrival over the communication link (lower layer interface): the 
packet is first checked for header checksum errors, and silently discarded if it has 
a header error. Otherwise, if the piggybacked acknowledgment is for a sequence 
number in the sender’s active window, this is used to update rack. This subsumes 
the handling of explicit ACK packets. If the received packet is a nak for a sequence 
number in the sender’s active window, the requested data packet is resent. If 
the received packet is a data packet, its data checksum is first verified. If the 
data is found corrupted, and the packet’s sequence number is in the receiver’s 
active window, it has not been previously received, and has not been negatively 
acknowledged since the previous timeout, then a nak is sent for that sequence 
number. If, on the other hand, the data is valid, and the packet number is the 
first in its active window (bears the sequence number recseq + 1), then the newly 
arrived packet is delivered to the upper layer. Furthermore, any later packets that 
have been buffered, and all of whose “predecessors” have been delivered to the 
upper layer, are also returned, in order, to the upper layer. At each point, recseq 
is appropriately incremented, thus shifting up the active window. 

If it is subsequently found that [IF/2] or more packets have been received 
since the last ACK (implicit or explicit) was sent, an explicit ACK is generated 
for recseq, and lack appropriately updated. If, however, the sequence number of 
the newly arrived data packet is not equal to recseq + 1 , meaning that there are 
some missing sequence numbers in between, the newly arrived packet is buffered 
(in recbuf), if not already received, and naks generated for all “earlier” missing 
packets, for which a nak has not been sent since the last timeout. 

(E2): a user request to send a new message (upper layer interface) : The sender 
first checks if there is an opening in its active window (i.e., that the active window 
size is less than IF). If there is an opening, the new message is transmitted, after 
being assigned the next new sequence number (sendseq), and the sender’s active 
window’s “upper edge” suitably adjusted. If, however, the sender’s window is full, 
it must wait for an opening (created by the receipt of an ACK, see above), before 
it can send the new message. In this case, it busy- waits in a loop, waiting for 

^ Henceforth, unless explicitly specified otherwise, we shall assume that all arithmetic 
is modulo SEQ. 
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the arrival of a new packet (see (El) above), or for the occurrence of a timeout 
(see (E3) below). 

(E3): a timeout: The nakd buffer is first cleared, signaling that fresh naks may 
need to be sent out. If there is no packet in the receive buffer (from the lower 
interface), then the receiver sends a nak for the “earliest” missing sequence 
number (recseq + 1) in its active window. Further, the sender resends the “old- 
est” message (if one exists in its active window), for which it has not received 
an acknowledgment from the remote. If, on the other hand, there is a packet 
available from the lower interface, we follow (El) above. 

Our model of the i-protocol was derived from the C-code of the implemen- 
tation, and involved a number of abstractions aimed at reducing the protocol’s 
state space. One such abstraction reduces the message sequence space from a 
fixed value of SEQ = 32 (a defined constant in the GNU implementation) to 
the value 2W when using a window size of W. Indeed, with a sequence space of 
SEQ = 32, a system consisting of just the receiver half of the protocol on one 
side and the sender half of the protocol on the other, connected by a single-buffer 
communication medium in either direction, would have an estimated state space 
of about 2.7 x 10^^, even with a window size of 1. In actuality, though, many of 
these configurations are observationally equivalent [Mil89] to one another, and 
by using a sequence space of 2W, this number can be reduced. For instance, for 
the case W=l, the estimated state space shrinks dramatically to about 1.6 x 10^, 
a reduction by almost a factor of 10^. 

Figure 1 shows how the i-protocol is modeled, namely as an asynchronous 
system of four processes, a sender, a receiver, and two medium units SR and 
RS. The medium units are modeled, in the usual manner, as lossy FIFO buffers. 
The packets sent over the medium can be data, ack or nak packets. Each packet 
has a data and header checksum field, which are nondeterministically reset by 
the medium to model corruption of the data or header. 



Sender Reeeiver 




Fig. 1: The system verified. 
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The sender and receiver processes contain local variables corresponding to 
the data structures described above for the sender and receiver halves. These 
processes, as modeled in the Concurrency Factory and Spin, can be regarded 
as direct translations of the pseudo-code discussed above. This is possible since 
the Factory and Spin are designed to work for asynchronous systems, and their 
input languages provide data structures, complex control statements and typed 
communication channels. Murt^ and XMC are also designed for verifying asyn- 
chronous systems, but modeling the i-protocol in these tools requires more effort: 
in Mur(^ the entire transition system of the i-protocol has to be encoded, while 
in XMC it has to be encoded as a set of CCS-like expressions. The modeling of 
the i-protocol in COSPAN and SMV requires yet the most effort, as the input to 
these tools are similar to finite state machines. Also, they are designed for the 
verification of synchronous systems, with extensions for asynchronous systems. 

Once the i-protocol has been modeled in each tool, various properties of the 
protocol can be checked, including deadlock-freedom, eventual message delivery, 
and livelock-freedom. In this paper, however, we will only present data relevant 
to livelock detection. 



4 Livelock Error 



The livelock error detected first using the Concurrency Factory, and subsequently 
using COSPAN, Murtp, SMV, Spin and XMC, is illustrated in Figure 2 for the 
case of IF = 2, medium buffer capacity 1, and assuming that one side acts as 
sender and the other as receiver. Initially, dataI sent by the sender is successfully 
received by the receiver, which responds with ACkI. This ACK is dropped by the 
medium. The sender then sends data2, which is also lost. The sender then 
enters its timeout procedure, and sends nakI and resends dataI. These (and 
all subsequent packets) are correctly delivered by the medium. Meanwhile, the 
receiver also times out, but finding the messages, NAkI, dataI, in its receive 
buffer, processes them. However, it silently ignores nakI, since it has never 
sent a data packet with sequence number 1. It also ignores dataI, since 1 is 
not in its current receive window. This cycle can now repeat forever, with the 
sender sending messages to the receiver, which the receiver ignores, resulting in 
no messages being accepted from, or delivered to, the upper layer in spite of the 
medium behaving perfectly from this point onwards. 

The livelock error arises because there is no flow of information from the 
receiver to the sender regarding the sequence numbers up to which the receiver 
has received all messages. A simple fix for this problem consists of sending an 
up-to-date ACK, on the receipt of a NAK for sequence number sendseq, provided 
that the active send window is empty. With this fix the model checker was unable 
to find any livelocks in the protocol. 




84 



Yifei Dong et al. 



Sender 



Receiver 



Send DATA 1 - 



packet dropped 
by medium 

Send DATA2 

packet dropped 



-X 



Receive DATAl; 
Generate ACKl 



Timeout : 

SendNAKl; 

Resend DATAl — 



Timeout : 

Receive NAKl; 

Ignore since not in active send window 
Receive DATAl; 

Ignore since not in active receive window 



repeat portion between dotted lines 

Fig. 2: An error scenario illustrating a livelock in the original version of the 
i-protocol. 



5 Model-Checking Results 

As discussed in the Introduction, the i-protocol makes for a formidable case 
study for verification tools, and forms the basis for an interesting comparative 
study. Table 1 contains the performance data obtained by applying COSPAN 
(version 8.15), Mur(p (version 3.0), SMV (version 2.4), Spin (version 2.9.7), and 
XMC to the i-protocol. Results are given for IT = 1 and W = 2, with the livelock 
error present ('fixed) and not present (fixed), and with a medium that can 
only drop messages (mini) versus one that can also corrupt messages (full). All 
results were obtained on an SGI IP25 Challenge machine with 16 MIPS RIOOOO 
processors and 3GB of main memory. Each individual execution of a verification 
tool, however, was carried out on a single processor with 1.9GB of available main 
memory. 

A few comments about Table 1 are in order. On some runs, memory was 
exhausted before the verification effort could complete. This is indicated in the 
“Completed?” column. The timing figures given in the table are “wall-clock” 
time rather than cpu time. This makes a difference in exactly one instance, 
W = 2/full/fixed for XMC, where 4.7GBytes of virtual memory are used. In this 
case, the wall-clock time is perceptively higher than the cpu time. Some table 
entries are left blank. This is because the corresponding data was unavailable 
because the tool does not provide it (e.g., the number of transitions, in the case 
of SMV) or because the tool failed to terminate on the run in question. The 
number of states reported by SMV is the total number of reachable states. The 
other tools give the number of explored states. 

Finally, the results for the Concurrency Factory are not included in the table. 
Although the Factory was the tool we used to first detect and diagnose livelock 
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Version 


Tool 


Completed? 


States 


Transitions 


Memory(MB) 


Time(min:sec) 


W=1 mini “fixed 


COSPAN 


Yes 


63K 


204K 


4.9 


0:41 




Murip 


Yes 


3K 


8K 


0.1 


0:01 




SMV 


Yes 


24.5M 




4.0 


41:52 




Spin 


Yes 


425 


768 


749 


0:10 




XMC 


Yes 


341 


571 


5 


0:01 


W=1 mini fixed 


COSPAN 


Yes 


1.5M 


5.9M 


116 


24:21 




Murip 


Yes 


7K 


19K 


0.3 


0:06 




SMV 


Yes 


27.7M 




5.3 


74:43 




Spin 


Yes 


322K 


IM 


774 


0:31 




XMC 


Yes 


3K 


12K 


78 


0:17 


W=2 mini “fixed 


COSPAN 


Yes 


154K 


486K 


13 


1:45 




Murip 


Yes 


45K 


122K 


2 


0:21 




SMV 


No 












Spin 


Yes 


35K 


71K 


751 


0:12 




XMC 


Yes 


1034 


1839 


11 


0:02 


W~2 mini fixed 


COSPAN 


Yes 


11. 3M 


42. 7M 


906 


619:49 




Murip 


Yes 


91K 


240K 


4 


1:37 




SMV 


No 












Spin 


Yes 


1.9M 


6M 


905 


2:28 




XMC 


Yes 


20K 


74K 


475 


1:49 


W=1 fnll “fixed 


COSPAN 


Yes 


116K 


345K 


9.1 


17:03 




Murip 


Yes 


54K 


205K 


2 


0:25 




SMV 


Yes 


425. 3M 




6.0 


201:04 




Spin 


Yes 


5.2K 


lO.lK 


749 


0:11 




XMC 


Yes 


961 


1521 


9 


0:01 


W=1 fnll fixed 


COSPAN 


No 












Murip 


Yes 


124K 


458K 


6 






SMV 


Yes 


583.3M 




9.8 






Spin 


Yes 


12. 6M 


44.9M 


1713 


17:50 




XMC 


Yes 


36K 


155K 


1051 


3:36 


W=2 full “fixed 


COSPAN 


Yes 


194K 


562K 


15.9 


29:40 




Murip 


Yes 


I.IM 


4M 


20 


9:43 




SMV 


No 












Spin 


Yes 


17K 


22K 


750 


0:17 




XMC 


Yes 


4K 


7K 


35 


0:05 


W=2 full fixed 


COSPAN 


No 












Murip 


Yes 


2.1M 


7.7M 


89 


41:55 




SMV 


No 












Spin 


No 












XMC 


Yes 


315K 


1.33M 


4708 


47:15 



Table 1: Model-checking results. 
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in the i-protocol, and it was able to do this for both window sizes 1 and 2, 
its CPU time usage was in general significantly higher in comparison with the 
other model checkers^. This situtation should improve with the new release of 
the Factory, planned for June 1999. 

As can be gleaned from the results of Table 1, the special-purpose cycle- 
detection algorithms of Spin and COSPAN served them well. In particular, these 
tools were able to complete analysis of several complex versions of the i-protocol, 
including W = 2 /mini /'fixed, W = 2/mini/fixed, and W = 2 /full/ 'fixed. The 
ability to specify atomically executed code sections in Spin also proved effec- 
tive, enabling Spin to complete analysis of the W = 1/full/fixed version. Spin, 
however, ran out of memory for W = 2/full/fixed, despite the use of partial-order 
reduction and bitstate hashing (with 98% state-space coverage). 

SMV exhibited excellent memory-usage performance on all runs of window 
size 1, but failed to complete in a reasonable amount of time on any run of 
window size 2. This is most likely due to an exponential blowup in the BDD 
representation for window sizes larger than 1. The dynamic variable reordering 
option of SMV was used on all runs reported in Table 1. Several static variable 
orderings were also tried, including a “sequential” ordering in which the vari- 
ables of the sender precede the variables of the sender-to-receiver medium, which 
precede the variables of the receiver, etc. An “interleaved” ordering, in which 
the components’ variables were strictly interleaved, was also attempted. In all 
cases, the dynamic reordering significantly outperformed the static ones. 

Muri^ and XMC performed the best on the i-protocol, completing on all cases 
of interest. Mur(/? uniformly exhibited superior memory-usage behavior (over all 
the other tools), due in part to the low-level nature of its specification lan- 
guage (guarded commands) and the succinct manner it encodes system states. 
Mur(/9 was also fast. XMC, however, was faster than Mun^ for all cases in which 
the livelock error was present. This is because of the local, top-down nature of 
XMC’s model-checking algorithm (Mur(/? is a global model checker). Prior expe- 
rience [RRR“*'97] indicates that the space requirements of XMC can be reduced 
through source-level transformations aimed at optimizing the representation of 
process terms. Finally, the number of states/transitions explored by XMC is ap- 
preciably lower in comparison with the other systems. This is primarily due to 
XMC’s use of lazily evaluated logical variables to represent variables and data 
structures in the specification, and the fact that XMC treats sequences of pure 
computation steps as atomic. 

6 Conclusions 

We have shown how an actual bug in a real-life communications protocol can 
be detected and eliminated through the use of automatic verification tools sup- 
porting model checking. We have also tried to demonstrate the i-protocol’s ef- 

® For the W = 1, mini, not fixed version of i-protocol, the Factory took 70 minutes 
and 41MB memory to detect the livelock. For W — 2, mini, not fixed, the Factory 
required 349 minutes and 118MB. 
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fectiveness as a verification-tool benchmark by conducting a comparative study 
of the performance of six widely used verification tools in analyzing the original 
and livelock-free versions of the protocol. 

Pertinent future work includes recruiting the actual developers of the model 
checkers used in this study to encode and analyze the i-protocol. We expect 
that the performance of each tool will increase under these conditions and it 
would be interesting to learn what “tricks” the developers employ to attain this 
improvement. 

For completeness, other properties of the i-protocol should be checked besides 
the absence of livelock, such as deadlock-freedom and eventual message delivery. 
It would be particularly interesting to apply a tool with deductive reasoning 
capabilities, such as PVS [ORR+96], to the i-protocol, so that a parameterized 
version of the protocol (window size, buffer size, etc.) could be analyzed. 

Finally, we invite developers of verification tools besides those considered in 
this case study to try their hand at the i-protocol and report the results to us 
for posting on the i-protocol web site. This will assist protocol developers and 
other software engineers interested in pursuing automated verification to make 
an educated decision about which tool is right for the task at hand. 
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Abstract. Compiled Java programs may be downloaded from the World 
Wide Web and be executed on any host platform that implements the 
Java Virtual Machine (JVM). However, in general it is impossible to 
check the origin of the code and trust in its correctness. Therefore stan- 
dard implementations of the JVM contain a bytecode verifier that stati- 
cally checks several security constraints before execution of the code. 
We have formalized large parts of the JVM, covering the central parts 
of object orientation, within the theorem prover Isabelle/HOL. We have 
then formalized a specification for a Java bytecode verifier and formally 
proved its sonndness. While a similar proof done with paper and pencil 
turned out to be incomplete, using a theorem prover like Isabelle/HOL 
guarantees a maximum amount of reliability. 



1 Introduction 

The Java Virtual Machine (JVM) is an abstract machine consisting of a mem- 
ory architecture and an instruction set. It is part of the Java language design 
developed by Sun Microsystems and serves as a basis for Java implementations. 
However, it also can be used as intermediate platform for other programming 
languages, since the JVM works independently of Java. The corresponding com- 
piler then generates architecture-independent JVM code instead of machine code 
for a specific host platform. This approach allows execution of compiled JVM 
code on any host platform that implements the JVM. However, this advantage 
does not come without risks. One can download any JVM code from the World 
Wide Web, and in general it is impossible to check the origin of the code and 
trust in its correctness. 

The Java Virtual Machine Specification (short JVMS) [LY96] describes a set 
of static and structural constraints that must hold for the code to assure safe 
execution, and requires that the JVM itself verifies that these constraints hold. 
However, this is not a formal specification, and it is in the nature of informal 
descriptions to contain ambiguities or even inconsistencies. Our goal is to give 
a fully formal specification of the JVM and a bytecode verifier that overcomes 

* Research supported by DFG project Bali. 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 89-103, 1999. 
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this problem. We think that this work can be useful in several aspects: on the 
one hand it allows the formal investigation of central concepts of the JVM, 
such as the correctness of the bytecode verifier and compiler verification; on the 
other hand it may serve as reference specification that is more accurate than the 
informal description. 

Formalizing a real life programming language is a very complex task and it 
is likely that an approach done with paper and pencil also will be susceptible 
to more or less grave errors. Therefore, tool assistance is required to reach a 
maximum amount of reliability. A theorem prover like Isabelle/HOL [Pau94, Isa] 
offers valuable support in developing consistent specifications and correct proofs. 

To avoid the execution of incorrect JVM code, several verification strategies 
for JVM code may be used, for example: 

- Cohen [Coh97] has implemented a so called defensive JVM using the theorem 
prover ACL2. In this approach runtime checks are performed to guarantee 
a type-safe execution of the code. 

- The JVMS [LY96] describes Sun’s implementation of a bytecode verifier, 
where most of the type-checking is done statically but several parts are 
delayed until runtime. 

- Qian [Qia98] has developed a specification for an extended bytecode verifier, 
where all type-checking is done statically. 

The specification of a bytecode verifier in Isabelle/HOL presented in this paper 
follows Qian’s work. However, our formalization of the operational semantics 
[Pus98] has been done independently of Qian’s approach. Therefore we had to 
deviate from Qian’s work in several points to make it fit to our approach. 

There are several other approaches to formalize (parts of) the JVM (see 
[Ber97, FM98, Gol97, HBL98, SA98]). As far as we know, our work is the first 
to formally prove the soundness of a bytecode verifier using a theorem prover. 

The rest of the paper is organized as follows: section 2 briefly introduces 
Isabelle/HOL. Section 3 describes our formalization of the JVM, in particular 
the representation of runtime data and the definition of an operational semantics 
for the JVM instructions. In section 4 we introduce the notion of static well- 
typedness and give a formal specification for a bytecode verifier. Section 5 defines 
the notion of soundness for a bytecode verifier and sketches the corresponding 
soundness proof. In section 6 we discuss two extensions we have added to the 
specification, and section 7 summarizes our results and outlines future work. 



2 Isabelle/HOL 

Isabelle [Pau94, Isa] is a generic theorem prover that can be instantiated with 
different object logics. The formalization and proofs described in this paper 
are based on the instantiation for Higher Order Logic, called Isabelle/HOL. 
Subsequently we give an overview over the basic types and functions used in 
this paper. 
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Isabelle’s type system is very similar to that of ML, with slight syntactic 
differences: function types are denoted by n t 2 , where ri => t 2 ^ 

may be abbreviated as [ri,T 2 ,. . .] => r„. Product types are written as a x /3 x 7 . 

Functions are preferably defined in a curried style (i.e. f a b c). Occasionally 
we have to define uncurried functions f {a, b, c); this is due to restrictions of 
Isabelle’s package for well-founded recursive functions. 

The basic types bool, nat and int are predefined. Isabelle/HOL also offers 
the polymorphic types a set (with the usual set operators) and a list. The list 
constructors are [] (‘nil’) and x#xs (‘cons’). The functions hd xs and tl xs return 
the head and tail of a list. The i-th list element is written xs ! i, length xs computes 
the length of a list, and set xs converts a list into a (finite) set. We also have 
map f xs to apply a function to all elements of a list, and zip xs ys takes two lists 
and returns a list of pairs. 

Inductive datatypes can be defined by enumerating their constructors to- 
gether with their argument types. For example, the predefined datatype for 
optional values looks as follows: 

a option — None | Some a 

In Isabelle/HOL, all functions are total. Partiality can be modeled using the 
predefined ‘map’ type which is defined as follows: 

a f3 = (a => f3 option) 

We use the infix operator ! ! of type [a f3,a] ^ /3 for ‘partial’ function applica- 
tion. Whenever f x = Some y then f U x = y. In the case of None the result will be 
an unknown value arbitrary, defined as ex. False (where e is Hilbert’s description 
operator) . 

Throughout this paper, we write logical constants in sans serif, whereas vari- 
ables and types appear in italic. 

3 The Java Virtual Machine 

JVM code is stored in so called classfiles. If the code is produced by compilation 
of a Java program, each Java class is translated into a separate classfile. Simi- 
lar to Java classes, a JVM classfile contains information about inheritance and 
implementation relations, as well as field and method definitions. Method code 
consists of a sequence of JVM instructions {bytecode). The machine model of the 
JVM has different memory areas for runtime data: a heap stores runtime objects 
and a frame stack contains state information for each active method invocation. 
Each method frame has its own operand stack and local variables array. Similar 
to Java, the JVM has an exception mechanism to treat error conditions. In our 
formalization, we consider a set of predefined exceptions, but do not yet treat 
exception handling. 

We have formalized large parts of the JVM, including the classfile structure 
and the operational semantics for a subset of JVM instructions covering the 
central parts of object orientation. Due to lack of space, we cannot present the 
entire formalization that can be found in [Pus98, NOP]. However, we introduce 
the main ideas of our approach. 
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3.1 JVM classfiles 

The first component of a classfile consists of the constant pool, a kind of symbol 
table containing name and type information. This is followed by a flag indicating 
whether the classfile describes an interface or a class, several pointers to constant 
pool entries returning the names of the current class, its superclass and direct 
superinterfaces, and finally the field and method definitions: 

a classfile = cpool x iflag x idx x idx x idx list x fields x a methods 

The type for methods is parameterized over the type of the method code, which 
may be instantiated later. This allows us to formalize the JVM instruction set 
and its operational semantics in a modular way. 

A predicate wf.classfiles checks the well-formedness of classfiles, e.g. the super- 
class and superinterface relations must be acyclic and method overriding must 
obey certain type restrictions. 

Example: Consider a set of classfiles (see figure 1) consisting of class Object, as 
well as the classes CO, Cl, C2, and Q. CO and Q are direct subclasses of Object; 
Cl and C2 are both extensions of CO. Class CO contains an integer field fO, class 
Q contains a method m. 




Figure 2 shows the contents of classfile Q. The interface flag is set to False, cpool 
index 1 points to the name of class Q. This information extends over two entries: 
the keyword Class indicates the entry type, index 9 points then to another entry 
containing the string Q (with keyword Utf8). The superclass index points in the 
same way to class name Object. The description of method m contains again 
two pointers. The first one returns name m, the second one points to a type 
descriptor. In our case, method m gets two arguments of type Cl and C2 and 
returns an integer. The code section m.code will be shown later. 

3.2 JVM Runtime Data 

The JVM operates on two different types of values, primitive values and reference 
values. We consider only primitive values, of type integer. The reference values 
are pointers to objects, the null pointer is expressed by a special null reference. 
The realization of object references is kept abstract: we model them by an opaque 
type loc that is not further specified. We define a datatype for JVM values as 
follows: 
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Fig. 2. Classfile for Q 



val — Intg int \ Addr loc \ Null 

You may have noticed that in contrast to our formalization, the JVMS [LY96] 
does not require values to be tagged with their runtime types. However, our 
approach does not impose any restrictions on possible implementations, because 
the type information is not used to determine the operational semantics of (cor- 
rect) JVM code. We use the type tags only to state and prove the correctness 
of the bytecode verifier, where the runtime types are checked against the static 
type information. 



3.3 Operational Semantics of JVM Instructions 



The JVMS [LY96] describes the operational semantics for each instruction in 
the context of a JVM state where several constraints hold, e.g. there must be an 
appropriate number of arguments on the operand stack, or the operands must 
be of a certain type. If the constraints are not satisfied, the behaviour of the 
JVM is undefined. 

In our approach, we formalize the behaviour of JVM instructions with total 
functions. If a state does not satisfy the constraints of the current instruction, 
e.g. if an element should be popped from an empty operand stack, the result will 
be the unknown value arbitrary. 

We have structured the instructions into several groups of related instruc- 
tions, describing each by its own execution function. This makes the operational 
semantics easier to understand, since every function only works on the parame- 
ters that are needed for the corresponding group of instructions: 



instr = LAS load.and.store \ CO create-object \ MO manipulate-object 
\ MA manipulate.array \ CH check.object \ Ml methAnv 
\ MR methjret \ OS opstack \ CB cond.branch \ UB uncondJ>ranch 
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Now, we can instantiate the type parameter for the code section of a classfile 
and introduce the following type abbreviation, describing a partial mapping from 
class names to classfilesd 

class files = ident {instr list) classfile 

Example: The code of method m is shown in figure 3. Aload i loads the content 
of local variable i onto the operand stack. Ifnull 3 compares the top operand 
stack element against Null and performs a conditional jump to pc = pc+S. Goto 2 
performs an unconditional jump to pc = pc+2. Getfield 4 loads a field described 
at cpool entry 4 onto the operand stack (which is in our example integer field 
to). Finally, I return closes the current method invocation and returns the integer 
result to the calling method. 



pc 


instr 


0 


Aload 1 


1 


Ifnull 3 


2 


Aload 1 


3 


Goto 2 


4 


Aload 2 


5 


Getfield 4 


6 


1 return 



Fig. 3. Code of method m 



Execution of a JVM instruction transforms the machine state. The machine 
state is formalized as a triple consisting of an exception flag, an object heap, and 
a frame stack. For each active method invocation, there exists a frame containing 
its own operand stack, a list of local variables, the name of the current class, a 
reference to the current method, and the program counter: 

frame — opstack x loevars x ident x methodJoc x pc 

jvmstate = xcpt option x heap x frame list 

If an exception has been raised or the frame stack is empty, execution termi- 
nates.^ If the machine has not yet reached a final state, the function exec performs 
a single execution step: it calls an appropriate execution function (e.g. exec.mo) 
and incorporates the result in the new machine state. If execution has reached a 
final state, exec does not return a new state. This is modeled by embedding the 
result state in an option type: 

exec :: class files x jvm^state => jvmstate option 
exec {CFS, (Some xp, hp, frs)) = None 
exec (CFS, (None, hp, [])) = None 
exec (CFS, (None, hp, (stk,loe,en,ml,pc)fffrs)) = 
case (get_code CFS en ml) I pe of MO ins => Some ( . . . exec.mo . . .) | ... 

^ We have abstracted from the size of instructions and regard the code section as a 
list of instructions. 

^ We do not yet treat exception handling. 
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For example, the operational semantics of the Getfield instruction for object field 
access looks like this: 

exec.mo :: [manipulate-object,classfiles,cpool,heap,opstack,pc\ 

=> {xcpt option X heap x opstack x pc) 
exec.mo (Getfield idx) CFS cp hp stk pc = 
let orcf = hd stk; 

(cn,od) = get.Obj {hp II (get_Addr orcf)); 

{fc,fn,fd) = extract.Fieldref cp idx; 

xp’ = if ore/=Null then Some NullPointer else None 

in 

{xp’ , hp , {od II (/c,/n))#(tl stk) , pc + l) 

CFS denotes a set of JVM classfiles. The operand stack stk is supposed to contain 
a reference to a class instance stored on the heap hp. In case of a null reference 
an exception is thrown. Otherwise, the referenced object contains class name cn 
and object data od. Index idx should point to a Fieldref entry in the constant pool 
cp, containing a class name fc, a field name fn and a field descriptor fd. The tuple 
{fcjn) determines the field whose value is stored on the operand stack. Finally, 
the program counter pc is incremented. 

Execution of the entire code then consists of repeated application of exec as 
long as the result is not None. The relation CFS h a — >* a’ maps a given set 
of classfiles CFS and a JVM state a to a new state a where the pair {a, a ’) is in 
the reflexive transitive closure of successful execution steps: 

_ h _ — >* _ :: [class files, jvm^tate,jvm-statc\ bool 

CFS h a — >* a’ (c,cr’) £ {{s,t). exec {CFS,s) — Some t}* 



4 A Specification for a Bytecode Verifier 

Standard implementations of the JVM contain a bytecode verifier that statically 
checks several security constraints before execution of the code. One main aspect 
of the bytecode verifier is to statically derive the types of possible runtime data 
and check that all instructions will get arguments of the correct type. 



4.1 Static types 

As Qian has pointed out in his work [Qia98] , the attempt to statically type-check 
JVM code requires the introduction of reference type sets instead of single types. 
This is due to the fact that, as a result of a branching instruction, a program 
point may have multiple preceeding program points. These predecessor points 
are allowed to contain values of different types. ^ In this case, the types of the 

® Surprisingly, the typing rule for the similar working conditional expression of the 
Java source language turns out to be more restricted (see [GJS96] and the discussion 
at [Typ]): it requires that the two branches yield two types where the first is a 
supertype of the second or vice versa. 




96 



Cornelia Pusch 



two branches have to be merged to the first common supertype. However, the 
JVM allows multiple inheritance of interfaces, and therefore this supertype is 
not necessarily unique. 

Qian defines a static type system including types representing adresses of 
subroutine calls and uninitialized objects. We do not yet consider these aspects 
of the JVM, but have added array types. Static types are represented as values 
of datatype tys. Among the primitive types, we only consider type Integer. A 
reference type is either the type of the null reference (NT), or an interface or 
class name (IT id or CT id), or an array type (AT ts, where ts contains the type 
of the components of the array) . A static type consists then either of a primitive 
type or a list of reference types. ^ During bytecode verification, type information 
of different execution paths has to be merged. In case of incompatible types, the 
result becomes unusable. This is expressed by a value of type any, which is either 
a static type or Unusable. The return type of methods is denoted by a value of 
type tyOrVoid, which is either a static type or Void: 

prim = Integer 

ref = NT | IT ident \ CT ident \ AT tys 

tys = PTS prim \ RTS {ref list) 

any = Unusable | US tys 

tyOrVoid = Void | TY tys 

We abbreviate US (PTS p) and US (RTS r) by Prim p and Refs r. 

If two types are merged, the resulting supertype must cover both types. 
A type a covers a type a’ (written CFS h a □ a’), if any instruction that is 
applicable to all values of type a is also applicable to all values of type a’. The 
predicate holds in the following cases: 

[classfiles,any,any] => bool 
CFS h Unusable □ a’ 

CFS h Prim Integer □ Prim Integer 

CFS h Refs rs □ Refs rs’ = (Vr’Gset rs’. drGset rs. widenConv CFS r’ r) 

Qian gives a more restrictive definition identifying the covering of reference types 
with the superset relation. In our definition, an element of the subtype needs not 
be contained in the supertype, it just must be convertible to one of its elements. 

A state type contains type information for all local variables and the operand 
stack of the current invocation frame at a certain program point. The local 
variables may contain unusable values (as a result of merging two incompatible 
types), whereas only usable values may be stored on the operand stack. We 
extend the predicate □ in two steps to state types: 

state.type = tys list x any list 

[classfiles,any list, any list] => bool 
CFS h as ^ as’ = 

length as = length as’A V(a,a’)e set (zip as as’). CFS h a □ a’ 



Due to restrictions to the construction of inductive datatypes, we model reference 
type sets as lists. 



4 




Proving the Soundness of a Java Bytecode Verifier Specification 



97 



[classfiles,state.type,state_type] => bool 

r\o( 

CFS h {ST,LT) □ {ST’,LT’) = 

CFS h map US ST F map US ST’ h CFS \- LT ^ LT’ 

Type information for the entire code of a method is collected in a value of method 
type. A value of class type maps a method reference to a value of method type, 
and a value of program type maps a class name to a value of class type: 

method.type = statej,ype list 
class-type = methodJoc ^ method.type 

prog.type = ident ^ class-type 



4.2 Static Well-typedness 

A bytecode verifier has to infer type information for each instruction and then 
check if the method code is well-typed. In our specification, well-typedness is 
checked with respect to a given type. A correct implementation of that specifi- 
cation must then compute a type that is well-typed according to the specification. 

We define a type checking predicate that checks whether an instruction at 
a certain program point is well-typed with respect to a given method type. 
Additionally, it checks several other constraints, e.g. an index to local variables 
must not be greater than the number of local variables and the program counter 
must remain within the current method. These constraints are indispensable 
to carry out the soundness proof for the bytecode verifier. The type-checking 
predicate makes a case distinction over the instruction to be executed at the 
current program point. In case of Getfield, the instruction is well- typed if the 
following predicate holds: 

wt_MO :: [manipulate-object,classfiles,cpool, method-type, pc, pc\ => bool 
wt_MO (Getfield idx) CFS cp A maxpc pc = 
let (ST,LT) = A ! pc; 

{fc,fn,fd) — extract.Fieldref cp idx 
in 

pc+1 < maXpc A is.class CFS fc A 
getJields (CFS !! fc) (fc,fii) = Some fd A 
3rs ST’. ST = (RTS rs) # ST’ A 

widenGonv CFS rs [GT fc] A 

CFS h A I (pc+1) □ (fd # ST’ , LT) 

All well-typedness predicates contain a line of the form CFS h A ! (pc+1) ^ type, 
which means that the next instruction expects a type according to type. Since 
that next instruction has possibly other predecessors, its type information is not 
necessarily equal to new- type, but rather must cover it. 

The above predicate enforces that the incremented program counter pc+1 does 
not exceed the code length maxpc. The class fc must be defined and must contain 
a field with name fn according to the constant pool entry. The stack must not 
be empty and the top stack element must contain a reference type convertible 
to the type of fc. Finally, the next instruction must expect a type according to 
the field descriptor fd on top of the operand stack. 
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Similarly to the execution function exec, we define a predicate wt_instr that 
selects the appropriate well-typedness predicate for each group of instructions. 
We extend the notion of well-typedness to methods, classes, and programs: at 
the beginning of a method body, the operand stack must be empty, and the local 
variables must contain values according to the type of the current class cn and 
the parameter descriptor pd of the current method: 

wt_start :: [classfiles,ident,param.desc,method.type] bool 
wt_start CFS cn pd A 

CFS h ! 0 □ (D,(Refs [CT cn])#(map (fd2any CFS) pd)) 

The code array of a method must not be empty, i.e. its length must be greater 
than zero. A method is well-typed with respect to a method type A, if it is 
well-typed at the beginning of the method body, and if for every program point 
in the method body the instruction is well-typed: 

wt.method :: [classfiles,ident,param.desc,retu'm.desc,mstr list, method Ay pe] bool 

dsf 

wt.method CFS cn pd rd ins A = 
let cp = get.cpool {CFS !! cn); 

maxpc — length ins 
in 

0 < maXpc A wt_start CFS cn pd A A 

V pc. pc<maxpc — > wtjnstr {ins I pc) CFS rd cp A maXpc pc 

Example: Method m is well-typed with respect to the method type shown in 
figure 4. The Getfield instruction at pc=5 requires an element of reference type 
on top of the operand stack. This may have been put there either by the Aload 1 
instruction at pc=2 or by the Aload 2 instruction at pc=4. This is reflected by the 
static type ST \ S = [RTS [CT C0][, which covers both possibilities®. 
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Fig. 4. Static type of method m 



A class is well-typed with respect to a class type F, if every method defined 
in that class is well-typed with respect to the corresponding method type: 

wt.class :: [classfiles,ident,class-type] bool 
wt.class CFS cn F 

VmZ rd ins. get_methods {CFS !! cn) ml — Some {rd,ins) 

— > wt_method CFS cn {snd ml) rd ins {F ml) 



® ST ! 5 = [RTS [CT C1,CT C2[[ is also a correct type. 
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A JVM program is well-typed with respect to a program type <P, if every defined 
class is well-typed with respect to the corresponding class type: 

wt.classfiles :: [class files, prog -type] bool 

wt.classfiles CFS $ = Vcn. is.class CFS cn — > wt.class CFS cn (F cn) 



5 Soundness of the Bytecode Verifier Specification 

A bytecode verifier (or more abstract: a type system) statically determines the 
types of all runtime data. A type system is sound, if the statically predicted type 
gives a correct approximation of a runtime value produced during execution.® 
In this section, we will show that our specification of a bytecode verifier is 
sound. For a concrete implementation of a bytecode verifier, it then remains to 
be proved that it satisfies our specification. 



5.1 Correct Approximation of Runtime Values 

In our formalization, runtime values carry some type information (see §3.2), 
whereas Qian has to go through the code and assign a type tag to each value 
depending on the instruction it has been created by. However, he only gives an 
informal motivation that indeed all runtime values can be associated with a tag. 
Therefore, our correctness relation between runtime data and static types differs 
from that given in [Qia98]: 

approx.val :: [class files, heap, val, any] => bool 
approx_val CFS hp (Intg i) at = CFS h at □ Prim Integer 
approx.val CFS hp Null at = 3rs. CFS h at □ Refs rs 
approx_val CFS hp (Addr a) at = 3 obj. hp a = Some obj A 

CFS h at □ (fd2any CFS (get_obj_type obj)) 

An integer value must have static type Integer or Unusable. The Null reference 
is approximated by any reference type or Unusable, and in case of an object 
reference Addr a, the corresponding object type must be a subtype of the static 
type. 

This notion of correct approximation is extended to local variables and the 
operand stack: 

approxjoc :: [class files, heap, locvars, any list] => bool 
approxjoc CFS hp loe LT 

length loc = length LT A y {val, any) £set (zip loc LT). approx.val CFS hp val any 

approx_stk :: [classfiles,heap,opstaek,tys list] => bool 

approx_stk CFS hp stk ST 
length stk — length ST A 

y{val,tys)£set (zip stk ST), approx.val CFS hp val (US tys) 



This is often formulated as ‘runtime data must be correct with respect to its static 
type’. Technically, there is no difference, but we regard our view as more intuitive. 




100 



Cornelia Pusch 



5.2 Soundness Proof 

Qian states a soundness theorem saying that for statically well-typed bytecode, 
the correctness relation between runtime values and static types of the current 
operand stack and local variables will be preserved in every execution step. 
However, his proof given in [Qia97] remains sketchy, and it turnes out that the 
theorem cannot be proved in the given form. A stronger proof invariant has to 
be formulated, assuring the correctness not only of the current operand stack 
and local variables, but the correctness of the entire state containing all active 
invocation frames. In particular, the method executed in the (n-l-l)-th frame 
must correspond to a method invocation of the n-th frame. 

We define several auxiliary predicates to formulate the correctness of all state 
components: in a correct heap, all objects contain correct data: 

correct.obj :: [classfiles,heap,obj\ ^ bool 
correct_obj CFS hp (Obj cn od) = 
is.class CFS cn A 

\/fl fd. (get_alLfields (CFS,cn)) fl = Some fd 

— > 3val. od fl = Some val A approx.val CFS hp val (fd2any CFS fd) 
correct_obj CFS hp (Arr fd ad =>) = 

Vrio/Gset ad. approx.val CFS hp val (fd2any CFS fd) 

correct_heap :: [class files, heap] bool 

correct.heap CFS hp — 'i a obj. hp a = Some obj — > correct.obj CFS hp obj 

The predicate correctJrame checks whether the operand stack entries stk and 
local variables loe have been approximated correctly by the state type (ST,LT). 
Additionally, the frame itself must be well-formed, i.e. the class cn is defined, 
the method reference ml points to an existing method, and the program counter 
pc points to an instruction inside the method code: 

correct.frame :: [classfiles, heap, state.type, frame] bool 

def 

correctJrame CFS hp {ST,LT) {stk,loe,cn,ml,pc) — 
approx_stk CFS hp stk ST A approxjoc CFS hp loc LT A is.class CFS cn A 
3rd ins. get.methods {CFS !! cn) ml — Some {rd,ins) A pc < length ins 

The predicate correctJrames checks whether a method reference ml„+i and a 
return descriptor rd„+i (belonging to frame fn+i) fit to the next frame /„ of the 
frame stack. If the frame stack is empty, the method must have return type void 
(i.e. return descriptor V). If there exists a frame f„, the last executed instruction 
must have invoked method ml„+i with return type rd„+i. Besides that, f„ itself 
must be correct. These checks are performed recursively on the remaining stack: 

correctjrames :: [classfiles, heap, prog-type, return.desc,methodJoc,frame list] => bool 
correctJrames CFS hp rd„+i mln+i [] = (rd„+i=V) 
correctJrames CFS hp rd„+i mln+i ifn#frs) = 
let {stk,loc,cn,ml,pe) = /„; 

(rdjins) = get_methods {CFS !! en) !! ml; 
cp = get.cpool {CFS II cn); 

{ST,LT) = {$ cn ml) ! pc 
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in 

3mi c k 1. pc = k+1 A ins ! A: = Ml mi A extract.meth cp mi = {c,mln+i ,rd„+i ,1) A 
correct.frame CFS hp (pop.rd CFS rd„+i ST, LT) /„ A 
correct.frames CFS hp <F rd ml frs 

The entire state is correct, if an exception has been thrown or the frame stack 
is empty. In case of a nonempty frame stack, the heap must be correct, the top 
level frame /n+i must be correct, and the remaining frame frs must be correct 
with respect to the method mln+i executed on the top level frame and its return 
descriptor rd„+i: 

correct_state :: [class files, prog -type, jvmstate\ ^ bool 
correct_state CFS <F (Some x,hp,frs) 
correct_state CFS <F (None,/ip,[]) 
correct_state CFS T> (None,/ip,/„ +i#frs) = 
let [stk,loc,cn,mln+i,pc) — fn+i; 

[rdn+i,ins) = get.methods [CFS II cn) II mi„+i 
in 

correct.heap CFS hp A 

correct.frame CFS hp {{<F cn ml) I pc) fn+i A 
correct.frames CFS hp <F rdn+i mln+i frs 

Now we can prove the following main soundness theorem: 



wf.classfiles CFS A wt.classfiles CFS <F A 
correct_state CFS F a A CFS h cr — >* cr ’ 
=> correct_state CFS F a ’ 



It says that for a set of well-formed classfiles CFS that are statically well-typed 
with program type F, program execution in a correct state a leads to correct 
states a’? This means that starting from a correct initial state (invoking the 
main method of the executed class), all possible runtime data for a program CFS 
is correctly approximated by its static type F. Inspecting the definitions of well- 
typedness and correct approximation, we are able to conclude that all required 
constraints will be satisfied at runtime, e.g. in case of the Getfield instruction, 
the top operand stack element will be a reference value Null or Addr a. 

The proof of the main theorem has been carried out by induction over 
CFS h a — >* a Then the preservation of the correctness property for a single 
execution step had to be shown by case distinction over the instructions. 

6 Extensions to the Bytecode Verifier Specification 

A bytecode verifier implementing our specification rejects bytecode that would 
not do any harm at runtime. Of course, it is not possible to build a complete static 

^ Remember that in our formalization, execution of a program is guaranteed by defi- 
nition, since we modeled it using total functions. 
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type system, since static well-typedness is undecidable. However, we can elimi- 
nate two unnecessary restrictions in our specification: instructions that are not 
reachable, i.e. dead code, may be neglected during type-checking, and operand 
stack values may be of type Unusable if they are not used for further computation. 
In fact, optimizing compilers will detect dead code and eliminate it. However, 
bytecode may stem from other sources, e.g. may be hand-written. Besides that, 
we wanted to check the modularity of our proofs: a modification of our specifi- 
cation should not entail too much adaptions of our proof script. 

Therefore, we have defined a predicate reach :: [instr list,nat] bool. It checks 
whether a certain program point may be reached from the starting point. We 
have then replaced in our definition of wt.method the premise pc < length ins by 
reach ins pc. Due to this, we had to adapt our proof invariant: a correct state now 
only contains reachable program points. We could then prove the new correctness 
statement by using an additional lemma, stating that any reachable state leads 
to another reachable state. The existing lemmas were not affected. 

Our second extension, the introduction of possibly Unusable values on the 
operand stack, did not impose any changes to the proofs at all. It strikes posi- 
tively that the formalization gets more readable, since operand stack and local 
variables are now treated in a uniform way, admitting both values of type any. 

7 Results and Further Work 

We have given a fully formal specification for the JVM and a bytecode veri- 
fier, and then formally proved the soundness of the bytecode verifier using the 
theorem prover Isabelle/HOL. The formalization of the JVM classfile structure 
and the operational semantics comprises about 1000 lines, the specification of the 
bytecode verifier took another 500 lines. The proof scripts contain approximately 
2400 lines. It took about 6 month to develop the formalization and conduct the 
proof. The most complex parts of the proof concern the instructions for field 
access and method invocation, where the existence of a field or method for some 
static type must assure that an appropriate field or method can be found at 
runtime. 

Isabelle/HOL turned out to be an adequate instrument to model real life 
programming languages such as Java (see also [ON98]). It is obvious that we 
had to make certain restrictions in this first approach to formalize the JVM. 
For example we do not consider the size of instructions and its operands and 
use instead abstract datatypes. These abstractions can be refined in further 
development steps of our formalization. 

As next steps, we want to extend our formalization and the proof to sub- 
routine call and object initialization. The work done by Qian [Qia98], Stata and 
Abadi [SA98], and Freund and Mitchell [FM98] showed that these constructs 
form the most complex part of bytcode verification, and therefore are worth a 
fully formal investigation using a theorem prover. 
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Abstract. We present formal techniques for improving the performance 
of modular communication systems. For common sequences of opera- 
tions we identify a fast-path through a stack of communication protocols 
and reconfigure the system’s code accordingly. Our techniques are im- 
plemented by tactics and theorems of the NuPRL proof development 
system and have been used successfully for the reconfiguration of appli- 
cation systems built with the Ensemble group communication toolkit. 



1 Introduction 

Due to the wide range of safety-critical applications [1], the development of secure 
and reliable communication systems has become increasingly important. Many 
communication protocols have been developed to ensure a variety of properties in 
a broad number of environments. To maximize clarity and code re-use, systems 
are often divided into modules that correspond to individual protocols and can 
be combined into stacks of protocol layers, as illustrated in Figure 1. But in the 
restricted context of a particular application these modular systems turn out 
to be less efficient than monolithic implementations, as they contain redundant 
and unused code and require communication between the individual modules. 
Furthermore they have to add headers to a sender’s message to indicate how 
the layers in the receiver’s stack needs to be activated, although typical data 
messages activate only a few protocols at all. Thus the costs of modularity are 
unnecessarily large execution times and increased net loads. 

By analyzing common sequences of operation, i.e. typical messages and the 
normal status of the communication system, one can identify a, fast-track through 
a protocol stack and reconfigure the system’s code accordingly. Experiments [8,7] 
have shown that dramatic efficiency improvements and speedups of factor 30-50 
can be achieved by partial evaluation, elimination of dead code and communica- 
tion between layers, compression of message headers, and delayed state updates. 
Fast-track optimizations, however, are not supported by current compilers. They 
cannot be done by hand either, as they require a deep understanding of the sys- 
tem’s code and, due to the code size of typical applications, contain a high risk 
of error. Besides, they must be done after configuring the application system and 
cannot be provided a priori. Therefore it is necessary to develop formal tools for 
an automated reconfiguration of networked application systems. 
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Fig. 1. Protocol stacking architecture: Layers are linked by FIFO event queues 



To overcome the formalization barrier, which had prevented formal tools for 
analyzing software from being used to maximum benefit, we have linked Ensem- 
ble [7], a flexible group communication toolkit, to NuPRL [3], a proof system 
for mathematical reasoning about programs. The resulting logical programming 
environment [10] is capable of formal reasoning about the actual Ensemble 
code and provides the infrastructure for verifying important system properties 
[9] and for applying formal optimization techniques to the system’s code. 

In this paper we will describe the formal techniques for a reconfiguration of 
modular group communication systems that we have implemented within the 
logical programming environment. These techniques support both the developer 
and the user of a group communication toolkit. The former has the expertise to 
identify the path of a typical application message through the code of an individ- 
ual protocol layer, i.e. when the system is in its regular state and does not have 
to deal with message partitioning, retransmission, buffering, synchronization, 
etc. We provide tactics for assumption-based symbolic evaluation of the code, 
which a system developer can use interactively to generate optimized pieces of 
code for each layer. Since these reconfigurations are independent from the ap- 
plication, they can be included in the distribution of the group communication 
toolkit. For the user of the toolkit we provide tools for reconfiguring the code of 
application protocol stacks and compressing the headers of common messages. 
Instead of using conventional rewrite techniques, which would not scale up very 
well, these tools are based on composing formal theorems about the results of 
reconfiguring individual protocol layers and are fully automated. 

In section 2 we will briefly resume Ensemble, NuPRL, and the logical pro- 
gramming environment on which all our techniques are based. Section 3 discusses 
tactics for the reconfiguration of individual protocol layers. The theorem-based 
reconfiguration of protocol stacks is the topic of Section 4. Section 5 deals with 
header compression while Section 6 describes how to convert the logical recon- 
figurations into executable code. Finally we discuss the interface between the 
protocol stack and the application that uses it in section 7. For additional de- 
tails not covered in this paper we refer to our technical report [12]. 
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2 The Logical Programming Environment for Ensemble 

Ensemble [7] is the third generation of group communication systems that 
aim at securing critical networked applications. The first system, Isis [2], be- 
came one of the first widely adopted technologies and found its way into many 
safety-critical applications. HORUS [16], a modular redesign of Isis, introduced 
a protocol stacking architecture and techniques for reconfiguring a stack of pro- 
tocol layers [8]. Reconfiguring HORUS protocol stacks, however, is difficult and 
error prone, as the system is written in C and complex to reason about. Concerns 
about the reliability of such a technology base led to the implementation of En- 
semble [6,7], which is based on HORUS but coded almost entirely in Ocaml [14], 
a member of the ML language family. Due to the use of ML, Ensemble is one 
of the most scalable, portable, and also fastest existing reliable multicast sys- 
tems. But the main reason for choosing Ocaml was to enable logical reasoning 
about Ensemble’s code [5,7] and to apply formal methods for reconfiguring the 
system and verifying its properties. 

The NuPRL proof development system [3] is a mathematical framework for rea- 
soning about programs and program transformations. Proof tactics can be tai- 
lored to follow the particular style of reasoning in distributed systems. Program 
optimizations can be achieved by applying rewrite tactics, which reconfigure the 
code of a given application. NuPRL ’s formal calculus. Type Theory, includes 
formalizations of the fundamental concepts of mathematics, programming, and 
data types. It contains a functional programming language similar to the core 
of ML. The NuPRL system supports interactive and tactical formal reasoning, 
user-defined language extensions, program evaluation, and an extendable library 
of verified knowledge. These features make it possible to represent Ensemble’s 
code and its specifications in NuPRL in order to use the system as a logical pro- 
gramming environment for the development of group communication systems. 

The formal link between Ensemble and NuPRL has been established by con- 
verting the code of Ensemble into terms of NuPRL’s logical language with the 
same meaning. In [II, Section 3] we have developed a type-theoretical seman- 
tics of Ocaml that is faithful with respect to the compiler and manual [14]. As 
Ocaml offers many features that are not used in Ensemble’s code but might 
cause complications in a rigorous formalization, we have restricted the formal- 
ization to the subset of Ocaml required for implementing finite state-event 
systems, i.e. the functional subset together with simple imperative features. Our 
formalization was implemented by using NuPRL’s definition mechanism: each 
Ocaml language construct is represented by a new NuPRL term that expresses 
its formal semantics. This abstraction is coupled with a display form that gives 
the formal representation the same outer appearance as the original code. Thus a 
NuPRL term represents both the text of an Ocaml program and its semantics. 

We also have developed a formal programming logic for Ocaml by describing 
rules for reasoning about Ocaml expressions and for symbolically evaluating 
them [11, Section 4]. We have implemented these rules as NuPRL tactics, which 
proves them correct with respect to the type-theoretical semantics of Ocaml. As 
a result, formal reasoning about Ensemble within NuPRL can be performed 
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entirely at the level of Ocaml programs, while program transformations always 
preserve the “OCAML-ness” of terms. This enables system experts to reason 
formally about OCAML-programs without having to understand type theory. 

Finally, we have created tools that convert OCAML programs into their formal 
NuPRL representations and store them as objects of NuPRL’s library [11, 
Section 5]. Using these tools we are able to translate the complete OCAML-code 
of the Ensemble system into NuPRL (and vice versa) and to keep track of 
modifications and upgrades in Ensemble’s implementation. 

3 Tactic-based Reconfiguration of Protocol Layers 

A protocol layer is an independent implementation of a communication protocol 
that does not make any assumption about a sent or received message. It has 
to ensure properties such as FIFO transmission, total message ordering, virtual 
synchrony, group membership, etc. Although the essential algorithms can be 
expressed as simple state-event machines, most of the actual code has to deal 
with unusual situations that do not affect a typical data message. Consequently, 
a system developer who implements or maintains the communication toolkit can 
easily identify the assumptions that the code of each layer makes about common 
sequences of operation. This makes it possible to reconfigure the protocol layers 
independently. To support such a reconfiguration of protocol layers, we have 
developed three basic program transformation mechanisms that rewrite the code 
of a layer under a set of assumptions in a correctness preserving way. 

Symbolic evaluation and function inlining simplifies the code of a protocol layer 
in the presence of constants or function calls. In a functional language such 
as Ocaml both techniques correspond to /3-reduction. As our logical program- 
ming environment provides tactics that represent the reduction rules for each 
reducible OCAML-expression, all formal rewrite steps operate on the level of the 
programming language and are guaranteed to be correct with respect to the 
type-theoretical semantics of Ocaml. To simplify an interactive application of 
these reductions we have summarized all reduction rules into a single evaluation 
tactic Red. It searches (top-down) for the first reducible subterm of an expression 
and reduces it. If given a subterm address as additional argument, it restricts 
the search accordingly. Thus it supports user-controlled simplifications of the 
code that allow a finer tuning than standard partial evaluation strategies, as the 
system developer can decide which reductions are actually meaningful. 

Context-dependent simplifications help extracting the relevant code from a pro- 
tocol layer. They have to trace the code-path of events that satisfy the assump- 
tions about the common case and isolate the corresponding pieces of code. Since 
assumptions are usually formulated as equalities, this can be done by apply- 
ing equality substitutions followed by reductions of the affected code pieces. We 
have developed a tactic UseHyp, which performs these steps automatically for a 
given assumption. Its implementation is straightforward, as the NuPRL system 
supports equality reasoning and the management of hypotheses. 
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type header = Variant record type for possible headers 
type state = Record type for the layer^s state 
let init () (Is, vs) - {Initial state of the layer} 

let hdlrs s (Is, vs) {up_out=up;upnm_out=upnm;dn_o'ut=dn;dnlin_o'ut=dnlm;diinin_out=dnnm} 

let upjidlr ev abv hdr = ... 
and uplmjidlr ev hdr = ... 
and upnmjidlr ev = . . . 

and dnjidlr ev abv = ... 
and dnnmjidlr ev = . . . 

in {up_in=up_hdlr ; uplm_in=uplm_hdlr ; upnm_in=upnm_hdlr ; dn_in=dn_hdlr ; dnnm_in=dnnm_hdlr } 
let I args vs = Layer. hdr init hdlrs args vs 

Fig. 2. Code structure of Ensemble’s protocol layers 

Specialized transformation tactics take advantage of common structures in the 
code of protocol layers. For the sake of clarity, message events are often separated 
according to their direction (outgoing or incoming), type (sending, broadcasting, 
or others), and further criteria. Separate message handlers are applied to each 
case but the separation itself follows a common pattern. A detailed analysis of 
this common structure makes it possible to write a tactic that rewrites the code of 
a layer until the relevant message handler is isolated. In contrast to the previous 
techniques this tactic does not require search but applies a fixed series of rewrite 
steps. Furthermore, it may use transformations that cannot be represented by 
partial evaluation, as they are based on ry-reductions, distributive laws, and other 
“undirected” equalities. Applying a specialized transformation tactic as a first 
step drastically simplifies the reconfiguration process for the system developer: 
it performs the tedious initial steps automatically and reduces the size of the 
code to be reconfigured interactively to a fraction of the original code. 

We have developed a tactic RedLayerStructure, which is based on the com- 
mon structure of Ensemble’s protocol layers. Figure 2 shows the typical struc- 
ture such a layer 1. Essentially, it consists of two functions. The function init 
initializes the state of the layer according to a given view state vs and local state 
information Is, while hdlrs describes how the layer’s state is affected by an in- 
coming event and which events will be sent to adjacent layers. For the latter, it 
describes how the event handlers of a protocol stack are transformed by inserting 
I into the stack. Ensemble distinguishes five sub-handlers for up- and down- 
going regular events (up,dn), events with no headers (upnm,dnnm), and local 
events (uplm or dnlm). The layer I itself is created from init and hdlrs through 
the function Layer. hdr, which glues the five sub-handlers together. This makes 
it possible to convert the implementation of a layer into a functional, imperative, 
or threaded version while keeping a single reference implementation. 

To reconfigure a the layer I we state the assumptions about common events 
and layer states and optimize the event handler of I, starting with the expression 
let ,hdlr) = convert Functional I args (Is, vs) in hdlrCsj, event) 

where convert generates the functional version of I, which consists of the initial 
state and the event handler hdlr. si is the current state of I and event 

an event of the form UpM(ev,hdr) or DnM(ev,hdr). The assumptions about the 
common case are stated in the form of equations that characterize the type of 
the event ev (send or broadcast), the structure of the header hdr (full, no, or 
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local header), and the state sj. For the sake of clarity we suppress irrelevant 
details by using the following formal abbreviation for the above expression: 

RECONFIGURE LAYER I FOR EVENT event AND STATE Si ASSUMING assumptions 
As first step in a reconfiguration we use the tactic RedLayerStructure. 
It unfolds the formal abbreviation and evaluates convert Functional I args 
( Is, vs), which leads to rewriting Layer. hdr init; hdlrs/ args (Is, vs). Af- 
ter applications of reduction rules, laws about nested let-abstractions, and 77- 
reductions the tactic isolates the relevant handler from upJidlr . . . dnnm Jidlr 
by matching event, which has the form UpM(ev,hdr) or DnM(ev,hdr) where hdr 
is Full (hdr, abv), NoMsg, or Local hdr, against a case expression in the code 
of Layer .hdr. As result we get the code of the relevant message handler. 

Afterwards we apply the tactic UseHyp to use assumptions whenever they 
help to eliminate branches of a conditional or a case expression and apply top- 
level reductions (i.e. the tactic Red) to make other assumptions applicable. We 
continue with controlled reductions until no more optimization is meaningful. 

These three basic tactics thus lead to a simple methodology for reconfiguring 
protocol layers. Except for the decision when to finish the transformation, which 
requires some insight into the implementation and can only be made by a system 
developer, it can be almost completely automated. We have used it successfully 
to reconfigure 25 of Ensemble’s 40 protocol layers for the four most common 
kinds of events, i.e. incoming and outgoing send- or broadcast messages. In many 
cases a layer consisting of 300-500 lines of code is reduced to a simple update of 
the state and a single event to be passed to the next layer. 

Example 1 . We illustrate the reconfiguration of the Bottom layer for received broad- 
cast messages. These messages have the form UpM(ev, Full (header, hdr)) where (1) 
ev is a broadcast event (getType ev = ECast), (2) there is no header (header = NoHdr), 
and (3) in the state s_bottom there is no record of a previous failure of the sender 
(s.bottom. failed. (getPeer ev) = false). We start with 

h RECONFIGURE LAYER Bottom 

FOR EVENT UpMCev, FulKNoHdr, hdr)) 

AND STATE s_bottom 

ASSUMING getType ev = ECast a s.bottom. failed. (getPeer ev) = false 

Applying RedLayerStructure makes the assumptions explicit and then evaluates the 
remaining program expression as described above. 

ASSUME 1 . getType ev = ECast 

2. s.bottom. failed. (getPeer ev) = false 

h (match (getType ev, NoHdr) with 

((ECast I ESend) , NoHdr) -> 

I (ECast, Unrel) -> 

I : 

) (s_bottom, Fqueue . empty) 

where ‘ ’ indicates that details of the code are temporarily hidden from the 

display. We now call UseHyp 1, which leads to an evaluation of the first case of the case 
expression and eliminates all the other cases from the code. 

h if s_bottom.all_alive or (not (s.bottom. failed. (getPeer ev))) 
then (s.bottom, Fqueue. add UpM(ev, hdr) Fqueue . empty) 
else free name ev 

Next, we use the assumption s.bottom. failed. (getPeer ev) = false and evaluate of 
the first case of the conditional by calling UseHyps 2. This results in 

h (s.bottom, Fqueue. add UpM(ev,hdr) Fqueue . empty) 
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No further reductions are meaningful, as the resulting state s.bottom and the quene of 
outgoing events, a qneue containing the single event UpM(ev,hdr) , are explicitly stated. 
Under the the given assumptions we know now 

hdlr^ (s .bottom, UpM(ev, FulKNoHdr ,hdr) ) ) = (s .bottom, [ :UpM(ev,hdr) : ] ) 

where hdlr^ denotes the event handler of the bottom layer and [:UpM(ev,hdr) :] ab- 
breviates Fqueue.add UpM(ev,hdr) Fqueue . empty. This means that the state of the 
layer remains unchanged while the original message is passed to the next layer after 
the header NoHdr has been stripped off. 

Verifying a reconfiguration. A fast-track reconfiguration in NuPRL is more 
than just a syntactical transformation of program code. Since it is based entirely 
on substitution, evaluation, and verified laws, we know that under the given 
assumptions a reconfigured program is equivalent to the original one. But in 
order to guarantee the reliability of a reconfigured communication system we 
must provide a formal proof of this equivalence. Formally, we have to prove 
let ,hdlr) = convert Functional I args (Is, vs) in hdlrCsj, event) 

= (s'l, [: out-events:)) 

where the left equand is the starting point of a reconfiguration and the right 
equand its final result, consisting of a modified state sj and a queue of outgoing 
events [; out -events:) . Again we introduce a formal abbreviation: 

RECONFIGURE LAYER I FOR EVENT event AND STATE Si ASSUMING assumptions 
YIELDS EVENTS [: out-events:) AND STATE sj 

Fortunately, there is a close correspondence between our reconfiguration mech- 
anisms and the logical inference rules of the NuPRL proof development sys- 
tem. It is easy to write proof tactics that perform exactly the same steps on 
the left hand side of an equation as our reconfiguration tactics Red, UseHyps, 
and RedLayer Structure did on the code of the protocol layer. We can there- 
fore consider the trace of a reconfiguration as plan for the equivalence proof 
and transform each reconfiguration step into the corresponding proof step. This 
makes it possible to prove the equivalence theorem completely automatically - 
even in cases where the reconfiguration required considerable user interaction. 

We have written a tactic CreateReconfVerify, which states the equivalence 
theorem and proves it to be correct by replaying the derivation of the reconfig- 
ured code. Since the tactic is guaranteed to succeed, it runs as a background 
process after a reconfiguration has been finished. 

4 Theorem-based Protocol Stack Reconfiguration 

In contrast to individual layers, protocol stacks have no a priori implementation 
but are defined according to the demands of the application system. As there are 
thousands of possible configurations, a designer of an application system who 
uses a group communication toolkit must also be given a tool that creates a 
fast-track reconfiguration of the application system automatically. 

It is easy to see that tactic-based rewrite techniques are not appropriate 
for this purpose, as they require interaction and expertise about the code of the 
layers and of the mechanism for composing layers. Furthermore they do not scale 
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Fig. 3. Reconfiguration methodology: composing reconfiguration theorems 



up very well: since messages may create additional events on their path through 
a protocol stack the reconfiguration tactics would have to deal with the entire 
code of the stack at once, which means that each rewrite step must operate on 
extremely large terms (representing more that 10000 lines of code). 

On the other hand, a fast-path through a protocol stack is characterized by 
the fact that events pass through the stack without generating more than one or 
two additional events. Thus it is possible to derive the result of passing a common 
event through a protocol stack from already known reconfiguration results for 
the individual protocol layers: instead of having to symbolically evaluate the 
complete code from scratch we compose the individual reconfiguration results 
according to our knowledge about the code for layer composition. 

Technically, we do this by eomposing formal theorems, as illustrated in Figure 3. 

— For each protocol layer we prove reconfiguration theorems about the result 
of reconfiguring its code for the most common types of events, i.e. up- and 
down-going send- and broadcast messages. Since these theorems only de- 
pend on the implementation of the protocol layers but not on the particular 
application system, they can be proven once and for all and be included in 
the distribution of the communication toolkit. For Ensemble we use the 
equivalence theorems that are generated automatically after finishing the 
reconfiguration of a layer, as discussed in section 3 

— For composing fast-paths through individual layers into a fast-path through 
a protocol stack we prove composition theorems about common combinations 
of fast-paths, such as linear traces (where an event passes through a layer), 
bouncing events, and messages that cause several events to be emitted from a 
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THM ComposeDnLinear 

RECONFIGURING LAYER Upper FOR EVENT DnM(ev, hdr) AND STATE s.up 

YIELDS EVENTS [:DnM(ev, hdrl):] AND STATE sl.up 

A RECONFIGURING LAYER Lower FOR EVENT DnM(ev, hdrl) AND STATE s_low 

YIELDS EVENTS [:DnM(ev, hdr2):] AND STATE sl_low 

^ RECONFIGURING LAYER Upper 1 1 1 Lower FOR EVENT DnM(ev, hdr) AND STATE (s.up,s_low) 

YIELDS EVENTS [:DnM(ev, hdr2):] AND STATE (sl_up, sl_low) 

Fig. 4. Reconfiguration theorem for linear down traces 

layer (splitting) - both for up- and down-going events. While the statements 
of these theorems often appear trivial, their proofs are rather complex as we 
have to reason about the actual code of layer composition and to perform 
all steps that would usually occur during a reconfiguration. By proving the 
composition theorems we express the logical laws of layer composition as 
derived inference rules and remove a significant deductive burden from the 
reconfiguration process: reconfiguring composed protocol layers can now be 
done by theorem application in a single inference step where a tactic-based 
reconfiguration would have to execute hundreds of elementary steps. 

Figure 4 presents a reconfiguration theorem for composing down-going linear 
traces in Ensemble. Assuming that a down-going event through the layers 
Upper and Lower yields a queue consisting of a single down-event and possi- 
bly modifies the state of these layers, we prove that sending the event through 
the composed stack Upper ||| Lower (where ||| is Ensemble’s composition 
function) does the obvious: states will be updated independently while the 
event is first modified by the upper layer and then by the lower layer. 

— Using the above theorems we can generate and prove reconfiguration theo- 
rems for a given protocol stack. To create the statement of such a theorem 
we consult the theorems about layer reconfigurations for the corresponding 
events and compose them as described by the composition theorems. Start- 
ing with the top of the stack we match incoming and outgoing events of 
the theorems for adjacent layers to determine the structure of the event that 
must enter the stack and the result of passing it through the stack. The states 
of the layers will be composed into tuples of states and the assumptions will 
be accumulated by conjunctions. 

To prove the stack reconfiguration theorem we use the information that we 
had gained while stating it. We instantiate the reconfiguration theorems of 
the layers in the stack with the actual event that will enter them. We then 
apply step by step the appropriate composition theorems to compose the 
fast-paths through the stack until the result is identical to the original state- 
ment of the theorem. Both proof steps are very easy to implement as they 
only require us to apply instantiated versions of already proven theorems. 
For Ensemble we have developed a tactic CreateReconf iguredStackthat, 
given a list of layer names, generates the reconfiguration theorem, proves 
it correct, and stores it under a unique name. Since all of these steps are 
completely automated the tactic does not require any user interaction but 
can instead integrated into Ensemble’s configurator. 

— From the logical reconfiguration theorems we finally generate Ocaml code 
for a modified protocol stack that can be used to replace the original stack. 
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We will discuss code generation in Section 6 after describing how to optimize 
a reconfigured stack by header compression (see Section 5) . 

Theorem-based layer composition leads not only to fully automated reconfigu- 
ration techniques but also to a much clearer style of reasoning as we raise the 
abstraction level of program transformations from programming language ex- 
pressions to reasoning about modules. It also improves the performance of the 
reconfiguration process, which requires only a few steps for each protocol layer 
passed by an event and thus scales up very well. Finally system updates can be 
handled much easier: the modification of a layer’s code only requires reproving 
the reconfiguration theorems for this particular layer while the reconfiguration 
of the protocol stack will remain unaffected or is re-executed automatically. 

Example 2. To reconfigure the stack Pt2pt ||| Mnak ||| Bottom for outgoing send-mess- 
ages, CreateReconf iguredStack consults the following reconfiguration theorems. 

THM Pt2ptReconfDnMESend_verif 

RECONFIGURING LAYER Pt2pt FOR EVENT DnM(ev, hdr) AND STATE s_pt2pt 

ASSUMING getType ev = ESend a getPeer ev ^ Is. rank 

YIELDS EVENTS [ : DnM(ev, Full (Data(Iq. hi s_pt2pt . sends . (getPeer ev) ) , hdr) ) : ] 

AND STATE s_pt2pt [. sends . (getPeer ev) — Iq.add s_pt2pt . sends . (getPeer ev) (getlov ev) hdr] 

THM MnakReconfDnMESend.verif 

RECONFIGURING LAYER Mnak FOR EVENT DnM(ev, hdr) AND STATE sjnnak 

ASSUMING getType ev = ESend 

YIELDS EVENTS [:DnM(ev, FulKNoHdr, hdr)):] 

AND STATE sjnnak 

THM BottomReconfDnMESend.verif 

RECONFIGURING LAYER Bottom FOR EVENT DnM(ev, hdr) AND STATE s.bottom 
ASSUMING getType ev = ESend a s.bottom. enabled 

YIELDS EVENTS [:DnM(ev, FulKNoHdr, hdr)):] 

AND STATE s.bottom 

Since all three layers show a linear behavior, they have to be composed in a way 
that makes the theorem ComposeDnLinear applicable. The incoming event of theorem 
Pt2ptReconfDnMESend_verif describes the event that enter the three-layer stack. The 
outgoing event of Pt2pt is matched against the incoming event of Mnak and the variable 
hdr in theorem MnakReconfDnMESend.verif is instantiated accordingly. Similarly, the 
outgoing event of Mnak will be matched against the incoming event of Bottom. The in- 
stantiated outgoing event of theorem BottomReconf DnMESend.verif describes the event 
queue emitted by the stack Pt2pt ||| Mnak ||j Bottom. The initial and resulting states of 
the three (instantiated) theorems are composed into triples and the assumptions are 
composed by conjunction. As a result CreateReconf iguredStack creates and proves 
the following reconfiguration theorem. 

RECONFIGURING LAYER Pt2pt 1 1 1 Mnak 1 1 1 Bottom 
FOR EVENT DnMCev, hdr) 

AND STATE (s.pt2pt, sjnnak, s.bottom) 

ASSUMING getType ev = ESend a getPeer ev 7 ^ Is .rank a s.bottom. enabled 
YIELDS EVENTS [:DnM(ev, FulKNoHdr, FulKNoHdr, 

Full (Datadq. hi s.pt2pt . sends . (getPeer ev) ) , hdr) ) : ] 

AND STATE ( s.pt2pt [. sends . (getPeer ev) ^ — Iq.add s.pt2pt . sends . (getPeer ev) (getlov ev) hdr] 

, sjnnak 
, s.bottom 
) 



5 Header Compression 



After reconfiguring a protocol stack we know exactly which headers are added 
to a typical data message by the sender’s stack and how the receiver’s stack 
processes these headers in the respective layers. A message that goes through the 
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THM Compress 

RECONFIGURING LAYER L FOR EVENT DnM(ev, hdr) AND STATE s 

YIELDS EVENTS [:DnM(ev, hdrl):] AND STATE si 

^ RECONFIGURING LAYER L WRAPPED WITH COMPRESSION FOR EVENT DnM(ev, hdr) AND STATE s 

YIELDS EVENTS [:DnM(ev, compress hdrl):] AND STATE si 

THM Expand 

RECONFIGURING LAYER L FOR EVENT UpM(ev, expand hdr) AND STATE s 

YIELDS EVENTS [:UpM(ev, hdrl):] AND STATE si 

^ RECONFIGURING LAYER L WRAPPED WITH COMPRESSION FOR EVENT UpM(ev, hdr) AND STATE s 

YIELDS EVENTS [:UpM(ev, hdrl):] AND STATE si 

Fig. 5. Compression and expansion theorems for dowir/up-traces 

fast-path obviously does not activate many protocol layers. Consequently, most 
of the added headers indicate that the layer has not been active. Such information 
does not have to be transmitted over the net if we encode the fact that the 
message has gone through the fast-path. Transmitting only the relevant headers 
will reduce the net load and improve the overall efficiency of communication. 

A straightforward method for eliminating irrelevant headers from a trans- 
mitted message is to generate code for compressing and expanding headers and 
to insert it between the protocol stack and the net. Compression removes all 
the constants from a header and leaves only the information that may vary. 
In the stack Pt2pt ||| Mnak ||| Bottom from example 2, for instance, an outgoing 
send-message receives the header 

FulKNoHdr , FulKNoHdr , Full(Data(Iq.hi s_pt2pt . sends . (getPeer ev) ) , hdr) ) ) 
This header contains keyword constants like Full, NoHdr, and Data that do not 
carry essential information. Without loss of information it can be compressed to 
OptSendClq.hi s_pt2pt . sends . (getPeer ev) , hdr):] 

To create the code for compression and expansion, we consult the reconfiguration 
theorems for received common messages and look at the structure of the headers 
of incoming events. Compression matches a header against this pattern and 
generates a new header only from the free variables in the pattern while removing 
all the constants. Headers that do not match such a pattern will not be changed. 
Header expansion simply inverts compression. Both programs can be generated 
automatically after a reconfiguration. For the stack Pt2pt ||| Mnak ||| Bottom, for 
instance, we get the following two programs. 

let compress hdr = match hdr with 

Full (NoHdr, Full (NoHdr, FulKData(seqno) , hdr) ) ) -> DptSend(seqno , hdr) 
i FulKNoHdr, FulKData(seqno) , FulKNoHdr , hdr))) -> OptCast (seqno , hdr) 

1 hdr -> NormaKhdr) 

let expand hdr = match hdr with 

DptSend( seqno , hdr) -> Full (NoHdr, Full (NoHdr , Full (Data (seqno) , hdr) ) ) 

1 DptCast (seqno , hdr) -> FulKNoHdr , FulKData(seqno) , FulKNoHdr , hdr) ) ) 

1 NormaKhdr) -> hdr 

Header compression can easily be integrated into the reconfiguration process. For 
this purpose we reconfigure the code of a protocol stack after wrapping it with 
compression. By doing so we generate an optimized stack that directly operates 
on compressed messages. Again we propose a theorem-based approach: since 
we already know how to reconfigure a regular protocol stack we prove generic 
compression and expansion theorems that describe the outcome of reconfiguring 
a wrapped stack in terms of the results of reconfiguring the regular stack. 

Figure 5 presents the compression and expansion theorems for Ensemble. 
They describe the obvious effect of applying Ensemble’s function wrapJidr to 
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a stack L and functions compress and expand, which we formally abbreviate 
by L WRAPPED WITH COMPRESSION. Proving the theorems removes another bur- 
den from the reconfiguration process: we can now make the transition from a 
reconfigured ordinary stack to its wrapped version in a single inference step. 

Based on compression and expansion theorems a reconfiguration of wrapped 
protocol stacks follows the same methodology as before. To generate the state- 
ment of the reconfiguration theorem, we compose the reconfiguration theorems 
for its layers and then compose the result with the compression and expansion 
theorems. To prove it, we first insert the result of a regular fast-track recon- 
figuration. For outgoing messages we then transform emitted headers into the 
form ‘compress hdr’’ and apply the theorem Compress. For received messages we 
transform incoming headers into ‘expand hdr' and apply the theorem Expand. 

For Ensemble we have developed a tactic CreateCompressedStack that 
performs all these steps automatically. For outgoing send-messages through the 
stack Pt2pt III Mnak III Bottom (c.f. example 2) wrapped with compression, for 
instance, it creates and proves the following reconfiguration theorem. 

RECONFIGURING LAYER Pt2pt 1 1 1 Mnak 1 1 1 Bottom WRAPPED WITH COMPRESSION 
FOR EVENT DnMCev, hdr) 

AND STATE (s_pt2pt, sjnnak, s.bottom) 

ASSUMING getType ev = ESend a getPeer ev Is . rank a s.bottom. enabled 
YIELDS EVENTS [:DnM(ev,DptSend(Iq.hi s_pt2pt . sends . (getPeer ev) , hdr)):] 

AND STATE ( s_pt2pt [. sends . (getPeer ev) — Iq.add s_pt2pt . sends . (getPeerev) (getlov ev) hdr] 
, sjnnak 
, s .bottom 
) 



6 Code Generation 

The reconfiguration theorems for protocol stacks describe how to handle common 
events in a much more efficient way. In order to use these results in a running 
application system we have to convert the theorems into OCAML-code that deals 
with all possible cases. For this purpose we introduce a “switch” that identifies 
the common case and sends fast-path messages to the reconfigured code while 
passing all other messages to the code of the original protocol stack. 

To convert a reconfiguration theorem into pieces of code we transform the recon- 
figuration results described by them, i.e. the modified states of the protocol stack 
and the queue of events to be emitted, into the code that creates these results. 
Modified states of the form s [ . /<— e] are converted into an assignment s . f<~e. 
No assignments are generated for unmodified states. A queue [ : evi ; . . ; ev„ : ] 
of emitted events is converted into a sequence of calls of event handlers for up- 
and down-going events. An event of the form UpM(ev,hdr) will lead to a call of 
upevhdr and DnM(ev,hdr) will lead to dnevhdr. 

To generate the switeh we consider all the reconfiguration theorems of the proto- 
col stack at once and create a case expression that separates the corresponding 
pieces of code. Usually we distinguish four fundamental cases: incoming and 
outgoing send- and broadcast messages. Assumptions of the reconfiguration the- 
orems that do not deal with the direction or the type of an event are converted 
into a conditional or into another case expression if free variables occur. As 
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an optimization we evaluate assumptions about generated events and eliminate 
those which are simple variations of other assumptions or evaluate to true. 
Events that fit one of four patterns and satisfy the conditions of that case will 
be directed to the code of the corresponding generated event handler. All other 
events will be passed to the event handler of the original stack. 

Using these insights we have developed a tactic Reconf iguredBody, which 
automatically generates the complete OCAML-code for the optimized protocol 
stack after from the reconfiguration theorems for the stack. For the three-layer 
stack Pt2pt III Mnak III Bottom (c.f. example 2) wrapped with compression, for 
instance, it creates the following optimized stack. 

let opt_stack state (Is, vs) = 
let up ev hdr (s,q) = (s, Fqueue.add (UpM(ev,hdr) ) q) 
and dn ev hdr (s,q) = (s, Fqueue.add (DnM(ev,hdr)) q) in 
let orig_stack = wrapjidr compress expand (Pt2pt . 1 1 1 1 Mnak. 1 1 1 1 Bottom. 1) in 
let Sjhdlr = orig.stack state (Is, vs) in 

let optjidlr ( (s_pt2pt , sjnnak, s.bottom) , event) = 
match event, (getType event) with 
I (DnM(ev, hdr) , ESend) -> 

if getPeer ev <> Is. rank & s.bottom. enabled 

then Iq.add s_pt2pt . sends . (getPeer ev) (getlovev) hdr 

; dn ev (OptSenddq.hi s_pt2pt . sends . (getPeer ev) , hdr)) 
else hdlr ((s_pt2pt,sjnnak,s_bottom) , event) 

I (DnM(ev, hdr) , ECast) -> ... 

I (UpM(ev, QptSend(seqno, hdr)), ESend) -> ... 

I (UpM(ev, QptCast (seqno , hdr)), ECast) -> ... 

I _ -> hdlr ( (s_pt2pt , s_mnak, s_bottom) , event) 

(s,opt_hdlr) 

After the code for the optimized stack has been generated as a NuPRL object 
we prove it to be equivalent to the original stack. We then export the code into 
the OCAML environment source file and compile it into executable code. 



7 The Application Interface 

The protocol stacking architecture depicted in figure 1, which places the applica- 
tion on top of the stack, is a simplified model of the real architecture of efficient 
communication systems. As reliable group communication has to deal with many 
aspects that are not related to the application, application messages should not 
have to pass through the complete protocol stack but only through the protocols 
that are necessary for handling data. Therefore Ensemble connects the appli- 
cation to a designated layer partial_appl within the stack (see left hand side 
of figure 6). Protocols that deal with the management of the group, e.g. with 
stability, merging, leaving, changing groups, virtual synchrony, etc. reside on top 
of this layer and are not used by application messages. 

While this refined architecture improves the efficiency of Ensemble it com- 
plicates a fully formal reconfiguration of its protocol stacks, because the inter- 
action between the partial_appl layer and the application does not rely on 
events anymore. Instead, application messages are processed by two functions 
recv_send and recv_cast, which in turn provide a list of actions that are con- 
verted into events. From the viewpoint of communication, the application is just 
a part of partial_appl and we have reason about the effects of recv_send and 
recv_cast to create a fast-path through partial_appl. These two functions link 
up- and down-going events and may also initiate many new events at once. 
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Fig. 6. Stack reconfiguration including the partial_appl layer 

Since the number of emitted events is not fixed, we cannot create theorems 
about the results of reconfiguring partial_appl. Instead, we must directly create 
a reconfigured version of the code of partial_appl, and apply specialized tactics 
to compose the resulting code with the already optimized code for the remaining 
stack. This leads to a transformed application stack as illustrated in figure 6: the 
reconfigured protocol interacts directly with the application and is very efficient. 

While it is comparably easy to generate the complete code of the reconfig- 
ured stack, proving it to be equivalent to the original one is more difficult. We 
cannot use the composition theorems from section 4 for describing the effects of 
composing partial_appl with the rest of the stack but have to use tactics that 
deal specifically with this situation. This makes a verification of the reconfigured 
stack more time consuming than a purely theorem-based approach but does not 
affect the reconfiguration process itself. 

8 Conclusion 

We have presented a variety of formal techniques for improving the performance 
of modular communication systems and applied them to networked systems built 
with the Ensemble group communication toolkit. They provide both interactive 
tools for a system developer, who uses expertise about the code to improve 
individual protocol layers, and fully automated reconfiguration mechanisms for 
a user of the communication toolkit, who designs application systems. 

We have implemented our techniques as tactics of the NuPRL proof de- 
velopment system, which are based on an embedding of Ensemble’s code into 
NuPRL ’s logical language. This guarantees the correctness of all optimizations 
with respect to the formal semantics of the code and enables us to use theorem- 
based rewriting, which raises the abstraction level of program transformations 
from expressions of the programming language to system modules. This leads 
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to a much clearer style of reasoning and makes our reconfiguration techniques 
scale up very well. To our knowledge there is no other rigorously formal system 
that can reason about the complete code of realistic applications. 

We have used our techniques to reconfigure the 22-layer protocol stack of a 
running application system, which resulted in significant improvements of the 
stack’s performance. In the future we intend to refine our techniques and to inte- 
grate them into the distribution of the Ensemble group communication toolkit. 
We also plan to add further reasoning capabilities to the logical programming 
environment in order to verify the code of Ensemble’s protocol layers [9] and 
protocol stacks. For this purpose we will integrate general deductive tools - such 
as extended type-analysis [4], first-order theorem proving [13], and inductive 
proof methods [15] - and develop proof tactics that are specifically tailored to 
Ensemble’s code. By combining all these techniques into a single environment 
we expect to create a software development infrastructure for the construction 
of efficient and reliable group communication systems. 
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Abstract. Ensemble is a widely used group communication system that 
supports distributed programming by providing precise guarantees for 
synchronization, message ordering, and message delivery. Ensemble eases 
the task of distributed-application programming, but as a result, ensur- 
ing the correctness of Ensemble itself is a difficult problem. In this paper 
we use I/O automata for formalizing, specifying, and verifying the En- 
semble implementation. We focus specifically on message total ordering, 
a property that is commonly used to guarantee consistency within a 
process group. The systematic verification of this protocol led to the 
discovery of an error in the implementation. 



1 Introduction 

Ensemble [8,16] is a working system for supporting group communication. 
In the group communication model, processes join together to form views 
that vary over time, but at any time a process belongs to exactly one view. 
Ensemble provides precise semantics for message delivery and ordering 
both within a view, and as views change. The Ensemble implementation 
is modular; applications acquire services by constructing layered protocol 
stacks. Ensemble currently provides about 50 protocol layers, and the 
number of useful protocols that can be constructed by composing the 
layers into protocol stacks numbers in the thousands. 

Ensemble eases the task of distributed-application programming by 
supporting properties like failure detection and recovery, process migra- 
tion, message ordering, and conflict resolution, through a common ap- 
plication interface. From one perspective. Ensemble provides a model for 
establishing confidence: the critical algorithms are cleanly isolated and 
modularized. From another perspective, the task of verifying thousands 
of protocols is seemingly impossible! Any verification model that we use 
must capture the modularity of Ensemble, and it must be able to provide 
automated assistance for module composition. 

In this paper we present our experience applying I/O automata [13,14] 
to Ensemble. The I/O automaton model provides a good framework for 
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modeling Ensemble because: (a) Ensemble layers can be described for- 
mally as automata, and composition of layers corresponds to composition 
of automata, (b) the I/O automaton model language supports a range of 
specification, from abstract specifications that characterize services to op- 
erational specifications that characterize program behavior, and (c) the 
automata can be interpreted formally, as part of a mechanical verification 
we are performing with the Nuprl system [5] . We demonstrate our expe- 
rience through a case study of the Ensemble total- order protocol, which 
specifies an ordering property for message delivery. It is built incremen- 
tally from virtual synchrony, a basic Ensemble service. We present the 
following contributions: 

— EVS, a specification for the safety properties guaranteed by the En- 
semble virtual synchrony layer. 

— ETO, for the Ensemble totally ordered virtual synchrony layer. 

— EVStoETOp, for the local program at node p, used in Ensemble in the 
implementation of eto using EVS. The original program was written 
in OCaml by Mark Hayden [16,8], based on C code developed by 
Robbert van Renesse for the Horus system [17]. 

— a simulation relation showing that the composition of EVS and all the 
EVStoETOp, for all p, implements eto. 

This document gives the specifications and summarizes the proofs 
for the total order case study. The full proofs are given in detail in [9], 
which provides the formal arguments used in the mechanical verification 
using the Nuprl proof development system. At the time of writing, the 
mechanical verification is partially complete. While we do not discuss 
proof automation specifically, the specifications we present were developed 
through a process of reverse-engineering, by hand-translating Ensemble 
code into a Nuprl specification, and the proofs were developed in concert 
with the Nuprl formalism. 

The outline for the rest of the paper is as follows. In Section 2, we 
give a brief description of the I/O automata formalism, and in Section 3, 
we use it to specify the abstract Ensemble client. We specify the eto and 
EVS services in Sections 4 and 5; we develop the layer specification and 
its verification in Section 6; and we finish with a discussion of the specific 
ordering properties that led to the discovery of an error in Ensemble and 
Horus in Section 7. 

2 Notation and mathematical foundations 



Sets, functions, sequences. Given a set S not containing T, the nota- 
tion 5j_ refers to the set 5 U {T}. We write (()) for the empty sequence. 
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If a is a sequence, |a| denotes the length of a. We also use the notation 
\a\^ to denote the number of elements in a that are equal to x. If a is 
a sequence and 1 < i < J < |a| then a{i) denotes the ith element of a 
and a{i..j) denotes the subsequence a(i), . . . ,a{j). We say that sequence 
s is a prefix of sequence t, written as s < t iff there exists i such that 
s = t{l . . . i). 

Views. V denotes the universe of all processes. ^ is a totally ordered 
set of identifiers used to distinguish views. Within Q, we distinguish view 
identifiers gp, p G V, one per process p. We assume that these special 
view identifiers come before all other view identifiers in the given total 
ordering of A view v = {g, P) consists of a view identifier g, g G G and 
a nonempty set P, P G 2^ , oi processors called “members” of the view. 
V = G X 2^ is the set of all views. Given a view v = {g, P), the notation 
v.id refers to the view identifier g of view v and the notation v.set refers 
to the view membership set P of view v. We distinguish special initial 
views Vp = {gp, {p}) for all p G "P. In specifications that associate at most 
one view with each identifier g G Q, we will sometimes refer to the “view” 
g, meaning the view with identifier g. 

Messages. We denote by M the universe of all possible messages. When 
messages are placed in queues, they are often paired with processors Ai x 
V. Given a message-processor pair x = {m,p), the notation x.msg refers 
to the message m, and x.proe refers to the processor p. 

I/O automata. I/O automata provide a reaetive model for programs 
that react with their environment in an ongoing manner, as described 
by Lynch [14]. An automaton consists of a set of aetions, classified as 
input, output, or internal, a (possibly inifinite) set of states, and a set of 
transitions, which are {state, aetion, state) triples. A valid exeeution is 
a state-action sequence sioi . . . SjOiSi+i . . . where each triple SiOiSi+i is 
a transition of the automaton. The I/O automata pseudocode we use in 
this paper describes the automaton in three parts: (1) the possible actions 
are described in the signature, (2) the state is expressed as a collection 
of variables and their domains, (3) the transitions are described with 
precondition/effect clauses for each action. 

3 The client automaton Cp 

The specification of the Ensemble client is shown in Figure 1. The client 
automaton is used to formalize restrictions on the environment in which 
Ensemble services exist. There is one client Cp per process p G V] each 
client represents a single process in an Ensemble application. The group 
membership changes over time in three distinct phases, represented by 
three modes. 
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Cp 

Signature: 

Input: ETO-BLOCKp, p G V 

ETO-NEWVIEW(li)p, V GV, p G V.SCt 
ETO-GPRCv(m)p,q, m G M, p,qGV 

state: 

mode G { “normal”, “preparing” , “blocked”}, 

Transitions: 

input ETO-NEWVIEW(w)p 
Eff: mode := normal 
input ETO-BLOCKp 
Eff: mode := preparing 
output ETO-BLOCK-OKp 
Pre: mode = preparing 
Eff: mode := blocked 



Output: ETO-BLOCK-OKp, p G V 

ETO-GPSND(m)p, m G M, p GV 

initially “normal” 

output ETO-GPSND(m)p 
Pre: mode 7 ^ blocked 
Eff: none 

input ETO-GPRCv(m)p,p 
Eff: none 



Fig. 1. The Cp specification 

The client is initialized in the “normal” mode, and it can communicate 
with other processes in the view by sending and receiving messages. When 
a new view is to be installed, Ensemble notifies the client by sending it 
a BLOCK message. The block message puts the client in the “prepar- 
ing” mode; the client may continue to send and receive messages in the 
“preparing” mode. The client may respond to the block request with 
a BLOCK-OK message, which makes the client “blocked.” The client is 
not allowed to send messages in the blocked mode. The transition from 
the “blocked” to the “normal” mode occurs when Ensemble delivers the 
NEWVIEW message, which installs a new view in the client with a poten- 
tially new list of view members. 



4 Ensemble virtual synchrony (EVS) 

Virtual Synchrony provides the semantics of group communication. The 
view guarantees provided by Ensemble can be summarized with the fol- 
lowing informal properties. EVS-aelf: if process p installs view v, then 
p G v.set. EVS-view-order\ views are installed in ascending order of view 
id. EVS-non-overlap: for any two processes p and q that both install view 
V, the previous views of p and q must either be the same or be disjoint. 

Failures may prevent messages from being delivered, and virtual syn- 
chrony provides the following delivery guarantees. EVS-msg-view: all deliv- 
ered messages are delivered in the view in which they were sent. EVS-fifo: 
messages between any two processes in a view are delivered in FIFO order. 
EVS-sync: any two processes that install a view U2, both with preceding 
view v\, deliver the same messages in view v\. 
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EVS 

Signature: 

In: EVS-GPSND(m)p, m € A4, p£V Out: EVS-GPRCv(m)p,g, m G At, p,q £V 

EVS-BLOGK-OKp, p £ V EVS-BLOGKp. p £ V 

Internal: evs-GREATeview(u), v £V evs-newview(u)p, v £V, p £ v.set 

State: 

created C V, initially {vp : p £ V} 
for each p £V'. 

mode[p\ £ { “normal”, “preparing” , “bloeked”}, initially “normal” 
aU-viewids[p\ C Q, initially {gp} 
for each p £ V , g £ Q'. 

pending[p, g] £ segof{M), initially (()) 
for each p,q £ V, g £ Q: 

next[p, q, g] £ N’'", initially 1 
Derived variables: 
for each p £V'. 

all-views[p] C V, given by {u £ created : v. id £ all-viewids[p]} 
eurrent-viewid[p] £ Q, given by max{aU-viewids[p\) 
current-view[p\ £ Q, a. default view v £ created such that 

v. id = current-viewid[p\ 
for each g £ Q, p £ P: 

pred-viewid[g , p] £ the largest viewid strictly smaller than 

g in all-viewids[p], it g £ all-viewids[p] and any such viewid exists, else _L 
for each v £ V, p £ P: 

pred-view[v , p] G V, a default view w £ all-views[p] such that 

w. id = pred-viewid[v.id,p], if u G all-views[p] and any such w exists, else _L 

Transitions: 

output EVS-BLOGKp internal evs-CREATeview(u) 

Pre: mode[p] — normal Pre: Vw G created : v.id > w.id 

Eff: mode[p] := preparing Eff: created -.= ereated U{u} 

input EVS-BLOGK-OKp input EVS-GPSND(m)p 

Eff: mode[p] := bloeked Eff: append m to pending[p, eurrent-viewid[p]] 

output EVS-GPRCv(m)q,p, choose g 
output EVS-NEWVlEw(u)p choose vl Pre: g = current-viewi’d[p] 

Pre: mode[p] = blocked pending[q, g] yf (()) 

vl = current- view [p] pending[q, g]{next[q,p, g]) =m 

V £ created pjj. riext[q,p, g] := next[q,p, g] -|-1 

v.id > vl.id 
Vq G v.set: 

if pred-view[v, q\^ 1. then 

pred-view[v, g] = ul V pred-view\v, q\.set 0 vl.set = {} 
if pred-view[v, q] = vl then 
Vr G vl.set: 

next[r,p, vl.id] = next[r,q, vl.id] 

Eff: mode := normal 

all-viewids]p] := all-viewids]p] U {v.id} 



Fig. 2. EVS specification 
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The automaton for EVS is shown in Figure 2. This automaton contains 
a state shared by all processes, and the external events in the signature 
are indexed by processes p G V. There is one event to match each of the 
client events. In addition there is a new internal action evs-createview(u) 
that creates new views that may eventually be installed. 

In the state, we keep a history for each process. The variable mode\p] 
represents the mode of client Cp. The sequence all-viewids\p\ is the his- 
tory of all views that have been delivered to process p. The sequence 
pendinglp, g] is the sequence of messages sent by process p in view g. The 
index next[q,p, g] indicates the next message to be delivered to process p 
from process q in view g (so pending[q, g]{next[q, p, g]) is the next message 
to be delivered). The view current- view \p] is the last view that was de- 
livered to the client, and pred-view[g , p] is the view delivered just before 
view g to process p. 

The transitions for evs-block and evs-block-ok represent state changes 
in the client. The transition for Evs-GPSND(m)p places the message m in 
the current sequence of pending messages for process p, and the transition 
for Evs-GPRCv(m)gp takes a message from the pending queue for process 
q and delivers it to process p. 

The evs-newview(u)p transition requires several properties before a 
new view can be delivered to the client Cp. The precondition v.id > vl.id 
requires that the new view be larger than the current view (which ensures 
EVS-view-order). For each process q G V, the precondition pred-view[v , q] = 
vlV pred-view[v, q].setr\vl.set = {} provides the EVS-non- overlap property 
for processes that have already installed view v {pred-view[v , q\ / T). 
The precondition next[r,p, vl.id] = next[r, g, ul.id] provides the EVS-sync 
property: the messages delivered from process r must be the same for 
all processes that have installed view v from view vl. These properties, 
together with the EVS-fifo property that follows from the ordering of mes- 
sages in the pending queues, yield the informal properties claimed by the 
designers. 

5 Ensemble total order (ETO) 

The ETO service guarantees all of the properties of EVS, and also the 
following ordering guarantees on message delivery. ETO-total: Any two 
messages mi and m 2 delivered to more than one process are delivered 
in the same order. ETO-causal: Messages are causally ordered: if process 
P 2 receives a message m from process pi, then it must have received all 
messages received by pi before m was sent. 

The automaton for eto is derived from EVS, with the differences 
shown in Figure 3: 1) the evs-- • • actions of evs are renamed with the 
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ETO: changes from EVS 



Signature: 

In: ETO-GPSND(m)p, m G M, p &V Out: ETO-GPRCv(m)p,g, m G A4, p,q £V 
ETO-BLOGK-OKp, p £ V ETO-BLOCKp. p £ V 

Internal: eto-CREATeview(v), v £V eto-newview(ii)p, v £V, p £ v.set 

ETO-ORDER(m/,i, j)p, mf £ M X {V ^ N+), i,j £ N+, p £V 

State: 

for each g £ Q: 

queue\g] £ seqof{V), initially (()) 
for each p £ V , g £ Q'. 

pending[p, g] £ seqof(M x ("P — > N+)), initially (()) 

Derived variables: 

enabled[p,q,g] £ bool, indicates when a totally ordered message can be delivered 
from process p to process q in view g: 

3i. queue[g]{i) = p 

A\queue[g]{l . = next[p,q,g] 

AVp' G P.p' / p => < i:\queue[g]{l . ■ ■ j)]^, = next[p',q,g] - 1 

Transitions: 



input ETO-GPSND(m)p 
Eff: choose g — current-viewid[p\ 
choose / = Xr.nexd[r,p,g] 
append (m, f) to pending[p, g] 
internal ETO-ORDER(m/, i, j)p, choose g 
Pre: \queue[g]{l . . . i)\^ = j - 1 
\queue[g]\^= j - 1 
pending[p,g]{j) = mf 
Eff: insert p into queue[g] at i 



output ETO-GPRGv(m)q,p, choose g, f 
Pre: g = current-viewid[q] 

pending[q,g]{next[q,p,g]) = (m, /) 
Vr G P.next[r,p, g] > fir) 
enabled [q, p, g] 

Eff: nexd[q,p, g] := next[q,p, g] + 1 



Fig. 3. The specification modifications for eto 

ETO-- • • prefix, 2) the transitions for ETO-GPSND(m)p and ETO-GPRCv(m)g^p 
replace the corresponding transitions of EVS, 3) ETO-ORDER(m, i, j)p is a 
new action, and 4) the eto state adds the state variables queue[g] and 
pending\p, g] to the state of evs. The total order for each view g G G is 
represented by the queue[g] process sequence, where message m* in the 
total order is from process queue[g]{i). The queue[g] entries are inserted 
by the internal action ETO-ORDER(m, i,j)p, which inserts process p into the 
total order queue[g] at location i after all other occurrences of process p 
in the total order. 

The message delivery ordering at process p is determined by the pre- 
condition for the Evs-GPRCv(m)q^p. The precondition Vr G V.next{r, q, g) > 
f{r) provides causal ordering: the ETO-GPSND(m)p transition saves a causal 
“snapshot” of the delivery state when the message was sent, and the 
Vr G V:next[r,q,g\ > /(r) is the causality requirement. Total ordering 
is determined by the enabled predicate: if enabled\p,q,g] then there is 
some index i into the total order queue[g] where the number of deliv- 
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ered messages from each process p' ^ V \s no more than the number of 
occurrences of p' in queue[g\{l . . . i) . This condition for ordering allows 
message deliveries that contain gaps. For example, consider the ordering 
queue[g] = {{p\ p2 pi P2 Pi Ps P2 Ps P2)), where the underlined process 
identifiers represent messages that have been delivered to process p. Two 
messages have been delivered from processes pi and p2- Message deliver- 
ies from Pi and ps are no longer enabled — they would violate the total 
order. The only possible future delivery is from process p2- 

6 The implementation algorithm (EVStoETo) 

Ensemble implements services using separate protocol stacks for each pro- 
cess. The layer that implements total-order uses a two-phase token-based 
algorithm. When a view is first installed with the evs-newview(u)p action 
a token is generated by the group leader (the process with the smallest 
process identifier). Each message sent during the first phase, called the 
ordered phase, must be associated with a token. When a process has a 
message to send, it is required to obtain a token. If it obtains a token h, it 
sends the message with the token, and generates new token h+i. During 
this phase, the sent messages (mi, ti), (m2, ^2), • • • can be totally ordered 
by their tokens. 

When messages are received by the layer from EVS in the ordered 
phase, they are saved in a queue, called the ordered queue, in the order 
determined by their tokens. The EVStoETOp layer delivers message m* to 
the client Cp only if messages mi, m2, . . . , mj_i have been successfully 
received by the layer (with the Evs-GPRCv(m)g^p action) and delivered to 
the client (with the ETO-GPRCv(m)p^q action). 

The second phase of the protocol, called the unordered phase, can be 
entered by the layer at any time. During the unordered phase, outgoing 
messages are sent without waiting for the token, and they are designated 
as “unordered.” Layers that receive unordered messages place them on a 
queue called the unordered queue. Delivery of an unordered message to 
the client is delayed until the installation of the next view, upon which 
the layer sorts the contents of the unordered queue by process-identifier, 
and delivers the queued messages to the client before delivering the new 
view. 

The specification for the EVStoETO layer is shown in Eigures 4 and 
5^. In this specification, tokens for messages in the ordered mode are 
represented by their number. The layer for EVStoETO uses four message 
types to communicate information about messages and their ordering: 

^ This version fixes the original error in Ensemble and Horns, which differed in the 
implementation of the precondition for ETO-GPRCv(m)q,p, as discussed in Section 7. 
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EVStoETOp 


Signature: define Mr = Ordered {t,m), t £ N, m £ Af 

1 Unordered (rn), m £ At 
1 TokenReq 

1 TokenSend (t, r), t £ N, r £ "P 


Input: 

EVS-GPRCv(m)q,p, m £ AIt, p,q£V 
EVS-BLOCKp, p eV 
EVS-NEWVIEW(w)p, V e V, p G V 
ETO-BLOCK-OKp, p G V 
ETO-GPSND (m)p, m G M, p G V 


Output: 

EVS-GPSND(t, m)p, t £ N, m £ Mt, P GV 
EVS-BLOCK-OKp, p G V 
ETO-BLOCKp, p GV 
ETO-NEWVIEW(t))p, V GV, p G V.Set 
ETO-GPRCv(m)q,p, m G M, p,q GV 


Internal: EVS-UNORDEREDp, p G V 




State: 

bloeked G bool, initially false 
have-block G bool, initially false 
have-block-ok G bool, initially false 
have-newview G bool, initially false 
holds-token G bool, initially true 
token G N, initially 1 
use-token G bool, initially true 
view G V, initially Vp 


request £ 2^, initially {} 
requested G bool, initially false 
pending G seqof{M), initially (()) 
next G initially 1 

order G N, initially 1 
for each t £ N: 

ordered [t] £ (AI xV)_l, initially _L 
unordered G seqof[M x V), initially (()) 


Transitions: 

input EVS-GPRCY{Ordered{t,m))q^p 
Eff: ordered[t] := {m,q) 
input EVS-GPRCv( I7nordered(m))5,p 
Eff: append {m, q) to unordered 
input EYS,-GPRCY{TokenReq)q^p 
Eff: request ~ request U {?} 
input EVS-GPRCv( TofcenSend(t, r))q^p 
Eff: if r = p A use-token then 
holds-token := true 
token := t 

input ETO-GPSND (m)p 
Eff: append m to pending 


input EVS-BLOCKp 
Eff: have-block := true 
input EVS-NEWVIEW(li)p 
Eff: have-newview := true 
view := v 

input ETO-BLOCK-OKp 
Eff: have-block-ok ~ true 
bloeked := true 
internal ETO-UNORDEREDp 
Pre: true 

Eff: holds-token := false 
use-token := false 



Fig. 4. State, input, and internal transitions for EVStoETO 

Ordered{t,m) pairs token t with message m, Unordered {m) designates an 
unordered message, TokenReq is used to request a token from another 
process, and TokenSend{t,p) is used to deliver token t to process p. 

The signature for the EVStoETO layer includes both actions for com- 
municating with EVS (the EVS-- • • events), and with the client (the eto-- • • 
events). In the specification, a process p is allowed to initiate unordered 
mode at any time with the internal event evs-unorderedp. The state has 
three parts. 

The view part maintains information about the view state and pend- 
ing views. The hloeked flag is true iff the client is considered to be blocked. 
The have-bloek, have-bloek-ok, and have-new-view flags keep track of 
queued block events as they are passed between EVS and the client; for 
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Output transitions 




output ETO-BLOCKp 


output ETO-GPRCv(m)q,p choose i,j 


Pre: have-block 


Pre: ordered[order] = (m, q) 


-^have-newview 


V {have-newview 


Eff: have-block := false 


Ai > order A j = Q 


output EVS-BLOCK-OKp 


A ordered\i\ = (m, q) 


Pre: have-block-ok 


Aq € view. set 


next = \pending\ + 1 


A Vj £ [order ... i — 1]: ordered[j] 7 ^ T 


Eff: have-block-ok := false 


=> ordered[j].proc view. set) 


output EVS-GPSND(Orciered(t, m))p 


V {have-newview 


Pre: pending (next) = m 


AVfc > order: ordered\k] yf T 


holds-token 


ordered\k].proc ^ view. set 


token = t 


A unordered {j) = {m, q) Ai = 0 
AVfc < j.Wp' = unordered[k].proc: 
p' ^ view. set V p' >v q 


Eff: next := next + 1 


token := t + 1 


output evs-GPSNd( f7nordered(m))p 


AVfc > j.\/p' = unordered[k].proc: 
p' ^ view. set V p' >v q) 


Pre: ^use-token 


pending (next) = m 


Eff: if ordered[order] = {m,p) then 


Eff: next := next + 1 


order := order + 1 


output ETO-NEWVIEW(w)p 


else if j = 0 then 


Pre: have-newview 


ordered [i] := J_ 


V = view 


else 


ordered[order] = T 


remove element j from unordered 


'ii-. unordered {i).proc ^ wew.sefoutput evs-GPSND ( T ofcenReglp 


Eff: have-newview := false 


Pre: next < \pending\ 


token := 1 


use-token 


order := 1 


-^holds-token 


next := 1 


^requested 


pending ~ (()) 


Eff: requested := true 


for each f £ N 


output evs-GPSNd( rofcenSend(t, r)p) 


ordered\t] := T 


Pre: next = \pending\ + 1 


holds-token = -<3q £ v.set.q <v p holds-token 


request := {} 


token = t 


requested := false 


r £ request 


blocked := false 


Eff: holds-token := false 


use-token := true 


request := request — {r} 
requested := false 



Fig. 5. Output transitions for EVStoETO 

instance, have-block is set in the transition for Evs-BLOCKp, and reset in the 
transition for ETO-BLOCKp. The view field is valid if the flag have-new-view 
is set, and it contains the next view to be delivered to the client. 

The next part of the state is for token-management. The holds-token 
flag is set iff the process is known to hold a valid token; the token is 
represented as a number stored in the token field. The use-token flag is 
true iff the layer is in the ordered phase of the protocol. The request field 
is a set of processes known to be requesting the token. The requested flag 
is set iff process p is actively requesting the token. 

The final part of the state is for ordering and queueing. The pending 
field contains the messages sent by the client in the current view. The 
next field is the index of the next message to be sent to EVS from the 
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pending queue. The ordered queue is the queue of ordered messages that 
have been received by the layer in the current view. The order field is the 
index of the last ordered message that was delivered to the client from 
the ordered queue. Unordered messages are stored in the unordered queue 
until the arrival of the next view. 

An ordered message is sent to EVS with the Evs-GPSND(m)p action when 
the process has the token and a pending message; pending messages are 
sent unordered only after the unordered phase is initiated. 

The ordering part of the protocol is implemented in the transition 
for ETO-GPRCv(m)p q. There are three cases where a message can be deliv- 
ered to the client: (1) The next ordered message ordered[order] has been 
queued. In this case, the message is delivered to the client, and the order 
field is incremented. (2) A new view is pending, there is a ordered mes- 
sage m from process q in the ordered queue, and q survives in the new 
view. The message is delivered to the client and removed from the ordered 
queue. (3) A new view is pending, all messages in the ordered queue be- 
long to failed processes (processes that are not in the new view), and 
message m is the first message from a surviving process q. The message 
is delivered to the client, and removed from the unordered queue. 

The new view is delivered to the client only after all messages from 
surviving process have been delivered to the client from the ordered and 
unordered queues. All messages from failed processes are discarded. 

The layer verification is a forward simulation relation, as described in 
Chapter 8 of Lynch [14] , showing that the implementation^ EVS composed 
with all the layers EVStoETOp and clients Cp for each p G V, implements 
the speeifieation eto composed with all the clients Cp for each p G V. We 
implement the specification as the automaton S, and the implementation 
as automaton T. We abbreviate T.EVStoETOp with the notation Lp (for 
“layer” p), and T.evs as V (for Virtual synchrony). The specification for 
the automaton S is the composition of eto and Cp for each p G V. 

For the implementation T, we define additional derived variables that 
correspond to values in the specification S, as shown in Figure 6. The 
mpending\p, g] is the list of pending messages in the evs automaton from 
process p in view g. The qeount\p,q] is the number of messages from 
process p that have been queued for process q by the layer EVStoETOg. 
The meount[p, q] is the number of messages from process p that have been 
delivered to process q by the layer EVStoETOg. The next\p,q\ is the index 
into mpending[p, g] of the next message to be delivered from process p 
to process q by the layer EVStoETOg. The lpending[p,q] field is the list 
of messages, both ordered and unordered, that are queued in the layer 
EVStoETOg for delivery to process q. 




130 Jason Hickey, Nancy Lynch, Robbert van Renesse 



T 




Compose: 


Hide: 


EVS 


EVS-GPSND(m)p, m £ Mt, p £P 


for each p £P: 


EVS-BLOGK-OKp, p £ F 


EVStoETOp 


EVS-GPRCv(m)p,q, m £ TWt, p,q£P 


Cp 


EVS-BLOCKp. p GP 


Derived variables: 


EVS-NEWVIEW(u)p, II £ V, p £ v.set 



for each p e "P : 

current-viewid[p] £ Q, given by EVS. current-viewid[p] if -lEVStoETO. /iave-neM)weui, 
or EVS.preci-weu)id[EVS.CMrreni-weu)irf[p],p] otherwise. This view represents the 
“current view” of the EVStoETO layer, 
for each g € G- 

oqueue[g] £ seqof(V), where 
oqueue[g]{i) — p 

if there is a pending entry j where EVS. pending[p, g]{j) = Ordered{i,m). 

The length | oqueue [g] \ is the number of pending ordered messages. 
uqueue[g] £ seqof{P), where uqueue[g]{i) =p 

if there is a pending entry j where EVS. pending[p, g]{j) = Unordered (m), 

and \uqueue[g]\p is the number of unordered messages in pending[p, g], 
and uqueue[g] is sorted by processor using the ordering <j> of EVS 
queue[g] £ seqof{V) defined by queue\g] — oqueue\g] + uqueue[g] 
for each p € P, g € Q: 

mpending[p, g] £ seqof[M) defined by the the sequence of Ordered and Unordered 
messages in EVS.pending[p, g] 
for each p, g £ P: 

qcount[p, g] £ N defined by the number of messages from processor p in 
EVStoETOq. ordered (EVStoETO. order . . .) and EVStoETOq. ttnordered 
mcount[p,q] £ N defined by the number of Ordered and Unordered messages in 
EVS.pending[p, EVS.eurrent-viewid[q'^{l . . . EVS.next[p, q, EVS. eurrent-viewid[q]]) . 
next[p,q] £ defined by meount[p,q] — qeount[p, q] 
lpending[p, q\ £ seqof{M) defined by the sequence of messages from p 
in EVStoETOq. ordered + EVStoETOq. ttnordered 

Fig. 6. Total Order Implementation 

These variables provide the state correspondence shown in Figure 7. 
The proof of the simulation relation is by induction on the length of 
executions. We summarize the proof here. 

First, we show that the F.Evs-GPSND(m)p corresponds to the action 
5.ETO.ETO-ORDER(m, i, j)p. The index j is the number of messages Lp.next. 
We choose the index i as follows. If m = Ordered{t, m') is an ordered mes- 
sage, then the insertion occurs at location i = t. If m = Unordered {in' ) is 
an unordered message, then the location i is the last location in T.queue[g] 
after all ordered messages, but before any occurrences of processes p' > p. 

Next, we show that the action T.eto-gprcv { m)q^p corresponds directly 
to the action S.eto-gprcv { m)q^p. For this part, we need to prove that 
each delivery T.eto-gprcv { m)q^p is both causal and enabled (with the 
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(T) S.ETO. created = V. created 
(2) S.Cp.mode = T.Cp.mode 

d) S.ETO. mode = T.Cp.mode 

{ V.all-viewids[p] — {Lp.view} if hnv 

V.all-viewids[p] if ~^hnv 

where hnv = Lp.have-newview 
® S. ETO. queue[g\ = T.queue[g\ 

® S. ETO. pending[p, g] = Lp. pending 
@ S.ETO.next[p,q,g] = T.next[p,q] 

where g = S.ETO. current-viewidijp) 

Fig. 7. State relation 

S.TTO.enahled[q,p, g] predicate). The ordering argument has three parts, 
corresponding to the precondition for Lp.Exo-GPRCv. 

For ordered messages in the first clause of the precondition, the order- 
ing conditions are straightforward. Since every message is associated with 
a token, and all messages are delivered in strict token order, causality and 
totality are trivially preserved. 

The proof for the second disjunct of ETO-GPRCv(m)g^p is more complex 
because of causal relationships between queued messages at the arrival 
of a new view. At the new view, the ordered queue may contain mes- 
sages interspersed with gaps for messages that were never received by the 
layer. The only assumption that can be made about the lost messages is 
that they were not received by any process in the new view (the EVS-sync 
property). Since the causal relationships are otherwise unknown, the im- 
plementation can only deliver messages from processes that remain in the 
new view. As we discuss in Section 7, the original Ensemble and Horus 
implementations did not implement this step exactly. 

Lastly, the proof of ordering for messages in the unordered queue is 
straightforward. Since delivery of unordered messages is postponed until 
the next view, all unordered messages are causally unrelated. The to- 
tal ordering property follows because the layers sort the messages using 
the ordering over V, and causality follows because messages from failed 
processes are not delivered. 

7 EVStoETO: discussion 

The most complex part of the proof is the action for evs-gprgv, because 
three different cases have to be handled: one case for ordered messages, 
one for unordered, and one for ordered messages that have been received 
during the transition when some layers are sending ordered messages, 
and others are sending unordered messages. The message delivery prop- 
erties of EVS do not guarantee that there will be no gaps in the ordered 
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queue of messages, even when a new view is passed to the layer with the 
Evs-NEwviEw(u)p action. This is a subtle point that involves the issue of 
causal ordering of messages. 

We can illustrate the problem with a scenario involving four processes, 
all initially in the same view. In this scenario, message mi is delivered to 
process p2, which immediately delivers it to the client. The client sends 
a new message m2 which is received by processes ps and p4, and then 
processes pi and p2 fail. 

Message m2 is causally related 
to message mi, but m2, m3, and 
m4 are causally unrelated because 
clients C3 and C4 do not receive 
any messages until the arrival of 
the new view due to the gap in 
the ordered queue left for message 
mi. Because of failures, there is 
no way to recover message mi . Which 
messages should be delivered? 

The implementations addressed ^ ^ 

this problem in two different ways. The Ensemble implementation dis- 
carded all pending ordered messages at the arrival of the new view, and 
Horus implementation delivered them all. Ensemble would discard mes- 
sage m3 but deliver m4, violating the EVS-fifo property, and Horus would 
deliver message m2 without delivering mi, violating ETO-causal. An im- 
plementation with the EVS-fifo and ETO-causal properties would deliver, at 
most, messages m3 and mi. 

When we first started working on the verification problem, the first 
step was to derive the specifications from the Ensemble ML code, which 
gave us the specification EVStoETOp shown in Eigure 5 without the mid- 
dle precondition for ETO-GPRCv(m)g^p. It became clear as we were doing 
the simulation proof that the simulation step for ETO-GPRCv(m)g^p would 
fail: there were some message deliveries that would not be allowed by 
the specification of total order eto. The solution seemed to be either 
to strengthen the properties of EVS or strengthen the precondition for 

ETO-GPRGV. 

When we spoke with the developers about this problem, we found a 
line of reasoning common to both implementations: if EVS were to preserve 
causal ordering of messages, the implementations would work correctly. 
However, causal ordering is not provided by EVS for efficiency reasons; 
applications that need causal ordering add an additional protocol layer 
to implement the property. The code was corrected by implementing the 
additional precondition and effect for ETO-GPRCv(m)g^p. The changes to 
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the implementation code were minimal, and both implementations have 
since been corrected. 

8 Related work 

Birman and Joseph presented one of the earliest accounts of virtual syn- 
chrony [4] in 1987. Since that time many group membership and com- 
munication specifications have appeared. An article published in 1995 [1] 
points out that many attempts of these have been unsatisfactory. Several 
new specifications have appeared that do not suffer from the shortcomings 
in [1], such as [15,7,2,3]. A specification of protocol layers and their com- 
position appeared in [17]. Automata are used for specifying distributed 
systems in [10,6]. In [11], protocol layers for point-to-point messaging are 
formally specified and composed using TLA [12]. 

References 

1. Emmanuelle Anceaume, Bernadette Charron-Bost, Pascale Minet, and Sam Toueg. 
On the formal specification of group membership services. Technical Report TR 
95-1534, Cornell University Computer Science Department, August 1995. 

2. Ozalp Babaoglu, Renzo Davoli, L. Giachini, and G. Baker. System support for 
partition-aware network applications. In Proceedings of the 18th IEEE Interna- 
tional Conference on Distributed Computing Systems, May 1998. 

3. Kenneth P. Birman. Building Secure and Reliable Network Applications. Manning 
Publishing Company and Prentice Hall, January 1997. 

4. Kenneth P. Birman and Thomas A. Joseph. Exploiting virtual synchrony in 
distributed systems. In Proc 11th Symposium on Operating Systems Principles 
(SOSP), pages 123-138, November 1987. 

5. R.L. Constable et.al. Implementing Mathematics in the NuPRL Proof Development 
System. Prentice-Hall, 1986. 

6. Alan Fekete. Formal models of communications services: A case study. IEEE 
Computer, 26(8):37-47, August 1993. 

7. Alan Fekete, Nancy Lynch, and Alex Shvartsman. Specifying and using parti- 
tionable group communication service. In Proe. 16th Annual ACM Symposium on 
Principles of Dist. Comp., pages 52-62, 1997. 

8. Mark G. Hayden. The Ensemble System. PhD thesis. Dept, of Computer Science, 
Cornell University, January 1997. 

9. Jason Hickey, Nancy Lynch, and Robbert van Renesse. Specifications and proofs 
for Ensemble layers. Technical Report forthcoming, MIT and Cornell University, 
1998. available at http:// www.cs.cornell.edu/jyh/papers/HLR98.ps. 

10. Bengt Jonsson. Compositional specification and verification of distributed sys- 
tems. ACM Transaetions on Programming Languages and Systems, 16(2):259-303, 
March 1994. 

11. David A. Karr. Protocol Composition on Horus. PhD thesis. Dept, of Computer 
Science, Cornell University, December 1996. 

12. Leslie Lamport. Introduction to TLA. Technical Report 1994-001, DIGITAL SRC, 
Palo Alto, CA, 1994. 




134 Jason Hickey, Nancy Lynch, Robber! van Renesse 



13. Nancy Lynch and Mark Tuttle. An introduction to Input/Output automata. Cen- 
trum voor Wiskunde en Informatica, Amsterdam, The Netherlands, 2(3):219-246, 
September 1989. Also Tech. Memo MIT/LCS/TM-373. 

14. Nancy A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996. 

15. Gil Neiger. A new look at membership services. In P roc. 15th Annual ACM Sym- 
posium on Principles of Dist. Comp., pages 331-340, May 1996. 

16. Robbert Van Renesse, Ken Birman, Mark Hayden, Alexey Vaysburd, and David 
Karr. Building adaptive systems using Ensemble. Software-Practice and Experi- 
ence, 29(9):963-979, July 1998. 

17. Robbert Van Renesse, Kenneth P. Birman, Roy Friedman, Mark Hayden, and 
David A. Karr. A Framework for Protocol Composition in Horns. In Proc. Ifth 
Annual ACM Symposium on Principles of Dist. Comp., pages 80-89, Ottawa, On- 
tario, August 1995. ACM SIGOPS-SIGACT. 




B82-7fiB-7a 



10.1007/bl07031130009 




An Automated Analysis of Ping-Pong 
Interactions in E-mail Services 



Anne Bergeron and Jean-Christophe Manzoni 

LACIM, Universite du Quebec a Montreal, 

C.P. 8888 Succursale Centre-Ville, Montreal, Quebec, 
Canada, H3C 3P8, 

{cuine, manzoni}@lacim.uqain. ca 



Abstract. Feature interactions occur when the composition of two pro- 
cesses produces unexpected or unwanted behaviors. The problem of de- 
tecting interactions can be formalized, but the resolution of such interac- 
tions remains almost an art since specifications must be changed in a way 
or another. In this paper, we describe a technique that can be used to au- 
tomatically propose modifications to the original specifications in order 
to remove unwanted interactions. We show that this technique removes 
sucessfully ping-pong interactions in E-mail services where messages are 
endlessy duplicated by a careless user or a distribution list. 

KEY-WORDS: Feature interactions, detection, resolution. E-mail ser- 
vices. 



1 Introduction 

Feature interactions occur when the composition of two processes produces ’’bad 
behaviors” . While it is not necessarily easy to detect such bad behaviors, these 
can be conveniently defined as sequences of events that should not occur. Thus, 
the problem of defining what is feature interaction is a specification problem. For 
example, when processes are modelled with transition systems, one can say that 
a bad interaction occurs if the composition has a deadlock, or does not meet a 
minimal set of behaviors. 

However, there is no widely accepted definition of what is feature interactions 
resolution [5]. One can advertise an interaction as a feature, or try to redesign 
the interacting processes [8], or add a supervisor that prevents bad interactions 
[4], [7], or even forbid the processes to interact at all. In this context, this paper 
proposes a technique that uses the detection phase to automatically suggest 
modifications to the specifications of the interacting processes. 
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As a simple example, consider the two processes P\ and P2 modeled by the 
following automata. 








When these two processes are composed on identical events, we get: 




There is no possibility that this automaton returns to its initial state after 
event a or b. Suppose that this is considered to be bad behavior, is there a way 
to modify the specifications of Pi and P2 such that good interactions can still 
occur, without deadlocking the whole process? 

The central idea of this paper is to identify, in each process, transitions 
that should be ignored by the other. Continuing with our example, consider the 
following labelling of P\ and P2- 



p: 





If these two processes are composed while allowing P^ to ignore transition 
ai, and process P2 to ignore transition 61, then we get the following composition 
which is deadlock free: 




Even if this new composition seems to solve the deadlock problem, there are 
still many questions to answer. Can we automate this technique? What does it 
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mean for a process to ignore a transition? Does this technique always remove 
bad interactions? Is the new specification for the processes still acceptable? 

In the following sections, we will try to answer these questions using, as a 
running example, well-known interactions that occur in E-mail services leading 
to problems such as endless message duplication or ping-pong effects. Section 2 
describes an abstract model of E-mail services that nevertheless captures many 
feature interactions. Section 3 sketches the resolution technique, and Section 4 
gives an example of resolution of a non trivial problem. 



2 Modeling E-mail Services 

A basic E-mail service allows clients to exchange messages. Such a service can 
be augmented by features that include, for example, distribution of a message to 
a group of clients, or automatic response to messages. Adding these features to 
the basic service, especially automatic responses, can - and do - lead to eccentric 
behavior like endless duplication of the same message. 

In order to study these behaviors, we want a model of these services that 
abstracts message content and delays of transmission. We will focus on exchanges 
of messages, assuming that if / sends a message to J it is received immediately, 
and we will model only the nature and order of exchanges. 

We will consider two features: a vacation function that can be used by a client 
to answer automatically to its correspondents, and a mailing list. It is interesting 
to note that the smallest system that exhibits endless loop behavior, called the 
ping-pong effect, consists of only two clients with their vacation function active. 



2.1 E-mail Clients 

An E-mail client is the program used by a person sending and receiving electronic 
mails. Let I and J be two E-mail clients. Two kind of messages can be exchanged 
between I and J : IJ will represent a message from I to J, and JI will represent 
a message from J to I. 

Fig. 1 presents the two automata modeling client I and client J. 




Fig. 1. Clients 



Both automata have only one state, the normal state, in which ordinary 
electronic messages can be sent or received. 
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2.2 Vacation Function 

A vacation function is a program sending a fixed message back to any client 
trying to reach client X. For example, when going away for holiday, client X 
might want to inform automatically every person trying to reach him of his 
return date. He will activate his vacation function. 

Fig. 2 shows the automaton modeling an E-mail client I which has activated 
its vacation function. 




IJ 



Fig. 2. Client I with its vacation function active 



When a message is received, in this case event JI, the system goes to state 
aJ and automatically sends back a message / J. Note that client I can still send 
normal messages. 

The automaton for client J would be similar. If there are more clients in the 
system, we add to the model an aX state for each possible correspondent X. 

2.3 Mailing List 

A mailing list is a program forwarding a message received from one subscriber 
to every other subscriber on the list, excluding the sender. When clients I and 
J are the only subscribers to the list, we get the automaton of Fig. 3. 



JL 



IL 




fJ 




LI LJ 

Fig. 3. A mailing list for two clients 



fl 



State //, for example, is reached after the reception of the message IL, which 
is a message from client I to the list. It is then forwarded to each subscriber, 
following a loop of events that eventually comes back to state nL. In the case of 
two subscribers, a message is forwarded only to the other subscriber. 

Altough the automata of Fig. 2 and 3 are relatively simple, we will see that 
they have a large potential for strange behaviors. In the next section, we will give 
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the formal model used for composing these automata, and tools for analysing 
their behavior. 



3 Interaction Detection and Resolntion 

3.1 Definitions 

In this section, we consider an automaton A on the alphabet S to be the usual 
finite deterministic automaton structure [9] with associated recognized language 
La- a state s is accessible if there exists a path from the initial state to s. We 
say that a state s is blocking if it is accessible and there is no path from s to 
a final state of the automaton. An automaton is blocking if it has at least one 
blocking state. 

Let Ai and A2 be two automata on the alphabets Si and S2 with inital 
states i\ and Z2 and final states Fi and F2. A set V of synchronizing vectors is a 
subset of Si X S2- The composition Ai Xy A2 of two automata is an automaton 
on the alphabet V whose states are pairs of states (si, S2) where si is a state of 
Ai and S2 is a state of A2. The initial state of Ai Xy A2 is (zi,Z2) and the final 
states are of the form (si, S2) where si is in Fi and S2 is in F2 (see [1]). 

If (01,02) G V is a pair of event, the transition (01,02) is defined in state 
(si, S2) if and only if oi is defined in si and 02 is defined in S2. 

As a simple example, consider the two clients of Section 2, both having 
activated their vacation function, with the normal state as initial and final state 
(Fig. 4). 




IJ JI 

Fig. 4. Two clients to be composed 



If one wants to compose these automata on identical events - that is, when 
a message is sent it is immediatly received - we set the synchronization vectors 
to: 



V = {{IJ,IJ),(JI, JI)}. 

With these vectors, we obtain the composition of Fig. 5 which is clearly 
blocking: client / and client J will exchange automatic messages forever. This is 
a first manifestation of the ping-pong effect. 

In the sequel, we will use the possibility that a composition can be blocking, 
even though its factors are non-blocking, as an indication of feature interaction 
problems. 
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(IJ, IJ) 

Fig. 5. A blocking automaton 

Definition 1. Let Ai and A 2 be two non-blocking automata on Si and S 2 , and 

V C S\ X S2 . 

Ai and A 2 are said to have unwanted feature interactions if A\ Xy A 2 is 
blocking. 

Using this definition, feature interaction detection can be accomplished with 
usual model-checking techniques. In our case, we used MEG [2], a software for 
analysing automata, developed at Universite de Bordeaux. 

Such an analysis allows one to identify all blocking states of an automaton. 
Once this set is known, it is possible to identify transitions that go from a non- 
blocking state to a blocking state. A first approach to resolution would be to 
inhibit those transitions, as in [7], and check the result for minimal requirements. 
Deciding which transitions can be inhibited is an engineering problem [3]: one 
has to select, for each process, which events can be inhibited or authorized. Such 
events are called controllable [6]. For example, in the E-mail service, a client 
could be prevented to send messages. 

In the automaton of Fig. 5, both transitions going out of the initial state 
would have to be inhibited, since the two other states are blocking. This solution 
is clearly unacceptable since the only action that the system would be allowed to 
do is the null action. The specifications of the two original processes have thus 
to be changed. In the next section, we will describe how these specifications can 
be changed in order to avoid deadlocks. 

3.2 Feature Interaction Resolution 

As we saw in the last section, simply inhibiting transitions that lead to blocking 
states is often unacceptable. However, one could hope to remove a deadlock by 
rerouting some transitions between blocking states. 

In order to be able to modify transitions, we first uniquely label the transi- 
tions in the blocking automaton: each copy of a transition t in the product will 
receive a unique label U. If the automata are deterministics, a transition can 
be uniquely identified by its source and target states. Such a labeling induces 
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a labeling of the two original processes. For example, Fig. 6 gives the labeling 
of the composition of the two client processes and the corresponding labeling of 
the two original processes. 






Fig. 6. Labeling the transitions 



The synchronization vectors have to be changed in order to account for the 
labels, thus the set V = {{IJ,IJ), {JI, JI)} will become 

V' = {(IJ„,IJ„),(IJa,IJa),(JIn,JIn),(JIa,JIa)}- 

The indices n and a have been chosen to reflect the state of the process that 
emitted the message. 

Our goal is to change the set V in order that some transitions are ignored 
by one of the processes. This can be done by replacing a transition of the form 
{ti,U) by one of the form (ti,e) or (e,ti), where e is the null event which is 
assumed to exist as a transparent loop on every state of the processes. 

Consider, for example, the new set of transitions: 

V” = {(/J„, JJ„), (/Ja, e), (JJ„, J/„), (e, J/,)}. 

With these synchronization vectors, the composition of the two processes, as 
shown in Fig. 7, is non-blocking. 

The impact of this modification on the original specifications is to erase some 
labeled transitions. This is accomplished by forcing the erased transition to loop 
on the state on which it was defined. In the case of the two clients, the new 
specifications are given in Fig. 8. 

Interpreting these new specifications in the context of E-mail services, one 
could say that automatic messages should he ignored by the vacation function. 
This new specification is acceptable since the systems retain most of their func- 
tionalities: sending messages, automatic response to non-automatic messages. In 
the sequel, we will use these new specifications for the vacation function. 
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Fig. 7. A non-blocking composition 

IJn Jin 




Fig. 8. New specifications 



3.3 Erasable Transitions 

We saw that deadlocks can be prevented by erasing properly selected transitions. 
Note that if each transition of the form {ti, ti) is replaced by the two transitions 
(ti,e) and (e,ti), the communication between the two processes is effectively 
cut off. In that case, each process ignores the other completely, and there is no 
interaction, either good or bad. 

We thus have to propose suitable heuristics to select which transitions are to 
be erased, hoping that the new processes will retain enough functionalities and 
communication, without deadlocking. Furthermore, we do not want to introduce 
deadlocks in the original specifications. To this end, we consider the following 
definition: 

Definition 2. A transition t of an automaton A is essential if its removal in- 
troduces a deadlock in A. 

We retained three criteria in order that a transition {ti,U) can be replaced 
by (ti,e) or (e,t*). 

Let automata Ai and A 2 model two processes on the same alphabet S, and 
let F be a set of synchronizing vectors where V C {{t,t)\t G S}. Suppose that 
the product A\ Xy A 2 is blocking. After relabelling the transitions, transition 
(ti,ti) can be replaced by (ti,e) if: 

(1) Transition ti is not essential in automaton A 2 . 
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(2) Transition occurs between two blocking states of the product. 

(3) There is at least one transition (tk,tk), defined in a non-blocking state, 

and differing from only by its label. 

The set of conditions for replacing transition by (e,ti) is similar. 

Condition (1) ensures that the process that ignores a transition will be able to 
function. Condition (2) ensures that only transitions that were already problem- 
atic will be modified. Finally, condition (3) tries to keep enough communication 
between the processes. Indeed, if event t is completely ignored by the other 
process, the system will probably loose important functionalities. 

When all erasable transitions are erased, the resulting system can still be 
blocking but the technique can be iterated since the set of non-blocking states 
can grow. 



4 Example of Resolution: the Ping-Pong Effect. 

In Section 3, we saw that clients of an E-mail service could use their vacation 
function as long as it did not reply to automatic messages. We will now investi- 
gate what happens when those clients subscribe to a mailing list. 

4.1 A Message Exchange Process 

We now have three actors in the system: clients / and J, and a list L. In order to 
model interactions between the list and the clients, we first construct a message 
exchange process that keeps track of the clients exchanges. Since clients / and J 
now have two correspondents, their specification has to be changed as in Fig. 9. 




Fig. 9. Automatic responses with two correspondents 



In these automata, we assume that if a message is exchanged between client 
X and the list L, the other client loops on that event. Those loops have been 
omitted for clarity. When composing the two clients on the same events we get 
the message exchange automaton of Fig. 10. 
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Fig. 10. Message exchange automaton 



This automaton is non-blocking, and it is possible to show that it would be 
the case with any number of clients. The essential transitions are IJ, JI, IL and 
JL. 

The automaton representing the mailing list, Fig. 11, is a variant of Fig. 3 of 
Section 2, allowing for the fact that transitions IL and JL have been labeled. 
In this automaton, transitions LI and LJ are essential. 




Fig. 11. Mailing list automaton 



4.2 Composing the Two Processes: The Ping-Pong Effect 

Fig. 12 shows the composition of the mailing list and the message exchange 
system. This automaton has only three non-blocking states, delimited by the 
dashed lines. 

All sorts of abnormal behaviors are exhibited in this automaton. The most 
interesting is when, for example, client / sends a message to the list in the initial 
state. The bold path of events following ILn is a loop containing only automatic 
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Fig. 12. The ping-pong effect 



messages - LJ, JLa, LI, ILa - which will quickly flood the mail boxes of all 
three participants. This is a striking example of the ping-pong effect. 

Another problem of this automaton is the existence of states in which no 
transition is deflned. For instance, the sequence Iln LJ ILn leads to such a 
state. In this case, the list wants to send a message to client J, and client J 
wants to send an automatic message to the list. Since neither process is in its 
normal state, no message can be received. One could think that the hypothesis 
that messages are sent and received simultaneously is too restrictive, but even 
if we add a buffer to the E-mail service, the problem reappears farther. 



4.3 Resolution 

We applied the algorithm described in Section 3 to the automaton of Fig. 12, 
erasing transitions that satisfied all three conditions of Section 3.3. After only 
one iteration, the new composition, shown in Fig. 13, was non-blocking. 

Four types of transitions were analysed: 

(1) Transitions LI and LJ were not erasable since they are not deflned in 
any non-blocking state of Fig. 12. 
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(2) All transitions of the form IJ or JI are already ignored by the list. Eras- 
ing them does not modify the composition. 

(3) Finally, only transitions that belonged to the list could be erased. The 
four easiest one are: 



I Lai, JLa2, ILa6, JLa7 

where the numbers refer to the corresponding state number of the message ex- 
change automaton of Fig. 10. They are all automatic messages sent to the list, 
and should be ignored. This conclusion is similar to the one we obtained in the 
case of two clients with active vacation functions. 

(4) Two instances of normal message also have to be ignored by the list. 
Those are ILnl and JLn6. Those are messages sent to the list while at least one 
of the subscriber with active vacation function has not yet sent an automatic 
acknowledgement to a previous message from the list. 

Thus, normal messages should not be broadcasted by the list when the ex- 
change system is in state 6 or 7. These states can only be reached after a first 
broadcast by the list, and the exchange automaton returns to state 5 after each 
subscriber with active vacation function has replied. Thus, if the list keeps track 
of the set of its absent subscribers, it can always check that all automatic re- 
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sponses have been sent back before resuming its normal operations. 
The revised specification for the mailing list is shown in Fig. 14. 




Fig. 14. The revised specification of the mailing list 



5 Conclusion 

In this paper, we proposed a technique to resolve feature interactions by auto- 
matically proposing modifications to the original specifications. The basic idea is 
to selectively suppress certain exchanges of information between two process in 
order to avoid deadlocking. We successfully applied the technique to ping-pong 
interactions in E-mail services. 

In general it is not possible to guarantee the suppression of all deadlocks, 
or the meeting of minimal requirements. However, the soundness of a proposed 
solution can be verified with usual model-cheking techniques. 

We modeled the different services at a high level of abstraction, and we 
were able to capture interactions with very small automata. We think that this 
technique could be readily applied to larger models, since the complexity of the 
resolution phase is linear in the number of transitions of the automaton that 
describes the composition of the processes. 
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Abstract. We present a tool for checking automatically the correctness 
of cryptographic protocols with finite behaviour. The underlying the- 
ory has been proposed in [13] and borrows some compositional analysis 
concepts for process algebras (see [3,8]). Here we extend the theory by 
showing an interesting relation among security properties. 



1 Introduction 

The great amount of security-sensitive information which flows in computer net- 
works has stimulated a lot of research in formal methods for the definition and 
the analysis of security properties of communicating systems. A typical exam- 
ple of security property is that only legitimate users can access some kind of 
information, or a particular service, or also that parties in a communication get 
assurance about the identity of their correspondents. 

Authentication protocols are the instruments used to design systems ensur- 
ing such properties; in turn they are based on cryptographic systems. The aim 
of cryptographic systems is that of permitting the exchange of messages through 
insecure media guaranteeing that only users who know a certain piece of infor- 
mation (key), can retrieve the actual content of the messages. 

Unfortunately, cryptography constitutes only a building block in design of 
secure protocols and it is not sufficient by itself, as proved by many flaws found 
in authentication protocols (see [2,6,11,18]); moreover authentication protocols, 
even those involving only few communications between parties, are recognized 
to be prone to errors. 

In the last years, several techniques to analyze communication protocols w.r.t. 
security properties have been developed (see [4,6,10,11,12,18]). Some of them are 
essentially based on the analysis of finite state systems, and typically can ensure 
error freedom only for a finite part of the behaviour of systems. 

Another approach for the analysis of cryptographic protocols is based on 
proof techniques for authentication logics (see [2,7,17]) or for process algebras 
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(see [1]). In general, these methods are not fully automated and need non trivial 
human efforts to analyze systems. An interesting exception is the work of Kindred 
and Wing ([7]) in which, an original fully automated approach for checking that 
a protocol enjoys some properties expressed in a logical language C is introduced. 

The methodology proposed in this paper is based on a transposition of com- 
positional analysis techniques for process algebras (also known as partial model 
checking or partial evaluation in [3,8]). This approach is novel in area of se- 
curity and has been proposed by the second author as a unifying theoretical 
framework for the analysis of several security properties (see [13,14]). The tool 
proposed here shows as these ideas can be fruitfully employed in practice. 

Many researchers specify security properties of a system in terms of their 
behaviour w.r.t. any environment in which they can operate (see [1,5,9,14]). 
We believe that this is a very natural approach and that partial evaluation 
techniques can be a valid tool for the analysis of this kind of properties. The 
intuitive idea of partial evaluation is the following: verifying that a system S, 
sharing the execution environment with a generic process X, enjoys a property 
expressed by a logical formula F, is equivalent to verify that X itself satisfies a 
particular formula F//S, computed according to the evaluation of S, in such a 
way to guarantee: 

SWX^FiS X^ F//S. 

In our framework the process X can be regarded as an intruder that tries to 
discover some information; the formula F is used to state the requirement that 
in no way X can obtain such information. Our strategy will be: 

— design suitable languages for protocol description and property specification; 

— develop partial evaluation techniques (almost automatically); 

— develop satisfiability procedure for the logic. 

The latter is a key point for understanding our proposal. There is a related 
verification problem for open reactive systems recently defined by Kupferman 
and Vardi, namely module checking. The problem is to verify that every be- 
haviour of a system, induced by the interaction with an arbitrary environment, 
satisfies a temporal logic formula. It is interesting to note that compositional 
analysis techniques can be used to tackle this problem too (see [15] for a deeper 
discussion). 



2 An operational langnage for the description of 
protocols 

2.1 Types and typed messages 

We assume given a set of basic type symbols T^, . . . T”, a set of type constructor 
symbols T’l, . . . , Fm with arity function ar : {1, . . . , m} — > N and inductively 
define the set of types as: 

r::=TMT)-(Ti,... ,T,,(,)) 



i e ,n}, j ,m} 
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m : T\ m' : T 2 qn 
(m, m') : Ti X T2 
m -.T\ k : Key , 
E{k,m) : E{Key,Ti) 



(m, m') : Ti X T2 . 2 ) 

m : Ti ^ 

E{k,m) : E{Key,T) 
m : T 



(m, m!) : Ti X T2 
m' : T2 






(3) 



Fig. 1. Example of Inference System 



We assume that any 6aszc type T® is “populated” by a finite set of &oszc messages 
BTi and a countable set of random messages RT^ s.t. BT^ n RT^ = 0. We will 
write m : T for m is a message of type T. Messages of type Fi{Ti, . . . , Tar(i)) 
are defined inductively as the minimum set satisfying: 

TTTi : Tij . . . , rn,ar{j) ■ y Fjijni^ . . . ryiar{j)) ■ Fj(Ti^ ■ ■ ■ ? ^ar(j)) 

in general we will allow messages to contain variables for other messages. Here 
we fix some notation: 

— SubM (m : T) denote the set of submessages of m : T; 

— a message m : T is pure if no variable occurs in SubM(m : T); 

— Msgs{T) is the set of pure messages of type T; 

— a pure message m : T is initial if VT* SubM{m : T) n MsgslJ"^) C BT^. 

An inference system IS is a, set of inference schemata 

mi'.Ti ... TO„ : T„ 

TO : T 

in which toi : Ti, . . . , to„ : is a (possibly empty) set of premises and m : T 

is the conclusion. A proof for a typed message to : T is a finite tree, rooted in 
TO : T, whose nodes are built from their descendants by applying a rule schema. 
We say that to : T is deducible from a set of messages (j) (and write (j> \~ m : t) 
if there exists a proof of to : T whose leaves have premises contained in (j). Each 
inference system induces a deduction function 'D{<f>) = {to \ T \ <f> \- m \ T} . 

Here we present a formalization of a deduction system similar to those used 
by many authors (see [10,12]). Among the basic types we have Key for encryption 
keys; the type constructors are Pair for pair formation, E for encryption and 
for key inverse; messages Pairimi^m^) ■ Pair{Ti,T 2 ) will be more succinctly 
written as {mi, m 2 ) '■ T\ x T 2 - The inference system is presented in figure 1. 
The interesting rules are 4 and 5; the former permits the encryption of messages 
by using a key, and the latter permits to deduce the clear message from the 
encrypted message and the decryption key. 



2.2 Syntax and semantics 

We briefly introduce the syntax of systems. A system (term) is generated by the 
grammar in figure 2, where m : T,m' : T' are typed messages, {{mi : Ti))i^j is 
a sequence of typed messages, C is a finite set of channels with c G C, x is a 
message variable, ^ is a finite set of pure typed messages, L is a subset of C and 
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S 

A 

a 



S\L I Si II S2 I (A)^ 

Nil I a. A I Ai + A2 I [m : T = m' : T']Ai; A2 
I [{{nii : \~is X : T]Ai; A2 

dm : T | c?{x) : T | r | Tc,m-.T \ Xc,x-.t \ 



Composed Systems 
Sequential Agents 

Actions 



Fig. 2. Syntax of systems 



i,j G N (the set of natural numbers). The set of channels occurring in a term 
A will be written as Sort (A). 

The inference construct [{{rrii : Ti))i^q h/S x : T]Ai; A 2 acts as a binder for 
the variable x in Ai, and the prefix constructs c!{x) : T.A, Xc,x-.t-A, gen^^^’^\A 
act as a binders for the variable x in A. 

Free and bound variables are defined in the usual way and consequently also 
open and closed agents (systems). Hereafter we will consider only closed agents 
in which every bound variable has a different name. The set Act of actions, which 
can be performed by a compound system, is defined as: 

Act = {r, Tg,Ti,c 7 m : T, dm : T, Tc^m-.T, Xc,m-.T \ c e C,m : T,g G RT*} 

The projections channel that given an action returns a channel and msgs that 
given a sequence of actions returns its messages can be straightforwardly defined. 

The language described here is essentially CCS with values (see [16]), a dif- 
ferent syntax is used for simplicity and some new construct is added to deal with 
message formation and deduction. 

As in CCS the operators Nil, a. A, Ai + A2, S\L, Si || S2 denote respectively: 
the process that can do nothing, the sequential composition of action a to process 
A, the choice among agent Ai and agent A2, the restriction of process S to actions 
not in L and the parallel composition of systems and 82- Again as in CCS 
we have the internal action (denoted by t) and the communication actions send 
and receive of message m : T over channel c (denoted respectively by dm : T 
and elm : T). 

In order to model insecure channels, we also consider the action Tc^m-.T aris- 
ing from the communication of a message m : T on an insecure channel c. The 
message can be listened by a process performing the eavesdrop action Xc,x-.t over 
channel c. In order to keep the message handling separated from the communi- 
cation aspects of the language, some new operator is introduced. The matching 
operator [m \ T = m' \ T']Ai; A 2 permits to check the equality between typed 
messages m : T and m' : T' executing the residuals A\ or A2 accordingly. The 
deduction operator [{{mi : Ti))i^j h /5 x : T]Ai~,A2 permits to deduce a new 
message x applying an inference schema IS to the set of messages {{mi : Ti))i^i] 
the residual A\ is chosen ii IS is applicable else the residual A2 is chosen. By us- 
ing this construct a finite number of times, an agent can build the proof of every 
message in V{4>). Typically it may be used to decrypt messages by applying a 
rule such as 5 in figure 1. Also we need to record the knowledge of an agent, i.e. 
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(?) 



m : T £ MsgsiT) 



(c?(a;) : T.A)^ {A[m/x\)^u{7n-.T} 



{gen)- 



9 = (i,j) 



ix) 



{gen^''"’^>.A)^ {A[g/x])^u{g:T} 

m : T £ Msgs{T) 



(Xc,x-.T-A)^ 



{A[m/ x\)^U{m.-.T} 



,n . m:T^m':T' (4^)^ ^ (4' )^, 

{{m-.T = m' ■.T']Ar,A2),i, 



(!)- 



m : T £ D{(l)) 



(dm : T.A)^ (A)^ 



([]2 



m:T = m':T' (4i) 



U±_ 



(A[) 



(+1 









(A'l) 



i[m:T = m' :T']A,-A2)^^ (A[)^, ^ ' {A, + A2U ^ {A[)^, 

^ {{mi : Ti))iei \~is rn:T {Ai[m/x])^u{m:T} (^'1)0' 

{[{{mi : Ti))i^i hjs x : T]Ai;^2)<^> {^'i)<l>' 

fl{m : T){{mj : Tj))i^i \~is m : T (^2),j> (^2)<f>' 



{V2 



{[{{mi : Ti))iei \~is X : T]Ai\A2)^ 



{\L)- 



S' fi £ Act — Msgsj^ ^ S 



^ ft 



XC,m:T / 



S\L 
S - 



■ S'\L 
S' 



■S' II ft - 

g c lm-.T 



^ S II ft ^ ft II ft ^ S' lift "°- 

Fig. 3. Operational semantics. 



S' II ft 



S' II ft 



the set of messages that an agent can use to deduce new messages; this is accom- 
plished using the notation {A)^ for the agent A with set of messages <j>. Let us 
make some assumptions on the capability of sequential agents to guess random 
values. Random generated messages (nonces) are used to witness the freshness 
of messages during executions of the protocol (runs) . To model the characteris- 
tics of these messages, we assume that for every basic type T* there is a subset 
of messages of this kind, i.e. RTb A particular operation permits to 

guess a random value of a basic type T. Random messages of composed types 
can be built by using basic random values as subcomponents. Since it should be 
quite unlikely to generate twice the same random message, we assume that any 
always instantiate a different value. This is achieved using, for each 
basic type T^, an injective function TZJ : N x N i-^RT^ associating to the pair 
(t, j) the value guessed by the process of the system. Moreover we assume 
that systems in their initial configuration contain only initial messages. 

The formal behaviour of a compound term is described by means of a Labelled 
Transition Systems {S,Act,{ — >}a6A)) where: 

— S' is the set of states (compound terms); 

— Act is the set of actions defined above; 

— ~^a£A is a set of transition relations on S, defined as the minimum set 
closed under the rules in figure 3. 
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For sake of conciseness we have omitted the symmetrical rules +2, H27 X2 smd 
we have used Msgsj^ for {c\m:T,clm:T,Xc,m-.T \ c € L,m:T G Msgs(T)}. 

Given a finite sequence of actions 7 = 71 , . . . , we will write S S' instead 
of S' = S'o S„ = S'. Given a sequence of transitions S ri S' and 

an agent X of S, we use (S ri S') to denote the subsequence of actions 
performed by X . 

3 A logical language for the description of protocol 
properties 

We present a logical language (Ck) for the specification of the functional and 
security properties of a compound system. We need to extend a normal multi- 
modal logic with operators dealing with the knowledge of agents. More precisely 
we need operators to describe if a given agent can deduce a particular message 
in a given execution 7. 

The syntax of the logical language Ck is defined by the following grammar: 

S' ::= T I F I {a)F \ [a]F \ | | m : T G | : (m : T) G 

where m : T is a, pure typed message, X is an agent identifier, I is an index 
set, (j) is a finite set of pure typed messages and 7 is a sequence of actions (e 
is the empty sequence). Informally, the (a)F modality expresses the possibility 
to perform an action a and then satisfy F. The [a]F modality expresses the 
necessity that after performing an action a the system satisfies F. A system S 
satisfies a formula m : T G .y H S can perform a sequence 7 of actions and 
an agent X of S, with knowledge 4 >, can deduce m : T using (j) plus the messages 
he has acquired in performing the sequence 7. This formula plays a central role 
in the analysis of authentication protocols, since these are often based on the 
sharing, between two parties, of a secret (a particular message that is assumed no 
one else knows). Hence, pieces of information are used to witness the identity of 
agents and the eventual disclosure of particular information can have dangerous 
consequences. For instance, the existence of a sequence 7 such that an agent X^j, 
can deduce m : T (i.e. the secret) can be expressed formally using the formula 
37 : {m:T) GK'^y The language without m:T GKf^^ and 37 : (m\T) G K'^ 
is called £. 

3.1 Semantics 

We assume given a deduction function T> which enjoys the following assumptions 
(e.g. the one defined in figure 1): 

1 For every type T the set of messages in Tt{(j>) n Msgs{T) is finite and con- 
structible^, when ^ is a finite set. We need this assumption since we want to 
be able to perform an automatic analysis. 

^ Here, it means that we have an effective procedure that returns an explicit enumer- 
ation of n MsgsiT), i.e. its canonical index. 
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2 If IS is an inference schema and S a bijection between random values then: 
mi : Ti . . .m„ : T„ hjs to : T iff S(mi : Ti) . . . S(m„ : T^) \~is 5{m : T). The 
idea under this assumption is to avoid deduction systems that are not general 
and depend on particular random values. 

3 If TO : T is a typed message and to : T G then every submessage of 

TO : r of a basic type must be a submessage of some message in <j>. We want 
that messages of basic type cannot be forged. 



We define the semantics of a formula F G Lk w.r.t. an LTS associated with a 
composed system S as follows: 

For every S we have S' ^ T, for no S we have S |= F 
S 1= Ai^jFi iff \/i G I : S \= Fi 

S 1= \/i^iFi iff 3i G I \ S \= Fi 

S h {a)F iff 3S' : S ^ S' and S' ^ F 

S h [a]F iff VS' : S ^ S'. S' h F 

S\=m:TG iff 3S' : (S ^4 S') 7 and 

m : T G 'D{(j) U msgs{j)) 

S 1= 3y : (to : T) G iff 3y : S |= to : T G 



4 Partial evaluation techniques 

To compute the partial evaluation it is convenient to assume a particular be- 
haviour of agents with regard to the generation of random values. In particular, 
we want that, if an agent performs a sequence of actions whose first action is the 
guessing a random value g : T, then this value will be eventually sent as sub- 
message of some message to' : T' during the sequence; moreover we require that 
between these two events only guessing actions are performed. These agents are 
called well behaved (see [13] for a formal definition). As notation we use <5||^A 
for {S II X)\L and we consider S = Nil when Va G Act S -fA. Since we are inter- 
ested in the analysis of formulas like 3y : to G K‘^ we can restrict ourselves to 
consider only this particular kind of sequential agents. In fact we can prove that 
if there exists a sequential agent s.t. S\^X^ |= 3y : (to : T) G then a 
well behaved agent X'^ exists s.t. S\j^X'^ ^ 3y : (to : T) G AT^, In figure 4 we 
give the partial evaluation function for |h 3y : (to : T) G AT^^, where^ and: 

succ{S) = {(c, to' : T,S')\S S' and m' :T G T>{(j>)}, 

Rsucc{S) = {(c,to' : T, (gi :Ti, . . . , :T„), 5')|A,^ 

S S' Am': T GV{<t)') A {gi : Tjie/ C SubM{m' : T)}. 

The set succ{S) represents the sending actions (and relative successors of S) that 
can be performed by the intruder. The set Rsucc(S) represents the sequences 

^ For sake of simplicity, we avoid the problem of considering always a different index 
of the gen actions in the translated formulas (this problem can be easily solved by 
global counters). 
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of guessing of random values followed by a sending of a message that can be 
performed by the intruder. By observing the compositional analysis proposed 
in [3,8] it can be noted that it is somewhat semantic driven. Analogously, our 
partial evaluation can be derived by inspection of the operational semantics 
of the language. Between brackets we have put in evidence the corresponding 
intruder behaviour. It is worthwhile noticing that succ{S) is a finite set (by 
assumption 1). 

37:(m:T)G = 

y(c,m':T',s')esucc(S) ' ^^'((37 : ("I : T) G K\ .^HS') (sending) V 

y (c,m' :T' ,S')eRsucc(S)(('’~Si--Ti))ieI W-'fn : T ) 

(By : (m:T) e (guessing) V 

y gc!m.':T' : T') (By : (m:T) £ ^ //S') (receiving) V 

V^-c.mCT'g,(Xc.m':T')(3y : (m:T) £ Kfy/)'" V/-S") (eaves - drop.) V 

Vs JVs' By : (m : T) £ //S' (idling) V 

m : T € Kfr ^//S (trivial) 

m:TG K^J/S = By : (m:T) £ = { p I y ^ ^ 

Fig. 4. Partial evaluation function for and 3y : (m : T) £ 

The next proposition states the correctness of the partial evaluation, where 
we assume that is a well behaved agent. 

Proposition 1. Given a system S, with Sort(S) U Sort(X) C L, a finite set of 
typed messages f and an initial message m : T then: 

S\\,X^ h 3y : (m : T) e iff h 3y : (m : T) G kI^//S. 

Unfortunately, the formula F = (3y : (m : T) G Kfr.^)//S presents various 

infinitary disjunctions, which are due to the analysis of the generation of random 
values by the agent X. We can prove, by our assumptions on the deduction 
function, that it is not fundamental which sequence of generation actions is 
performed, the essential thing is the correct kind of types that are generated. So 
we can give a translation from this formula to one without infinitary disjunctions, 
s.t. the satisfiability is preserved, i.e. F is satisfiable iff F is satisfiable (see [13]). 
This translated formula F presents only finitary disjunctions. This translation 
can be performed during the generation of F and leave unchanged the finitary 
part of the formula. We have reduced the verification of the existence of an agent 
X^ s.t. S\\j^X^ \= : (m : T) G to a satisfiability problem in a sublogic 

of C. Moreover, the decidability problem for this sublogic is simple and we can 
build an agent (i.e. intruder) for a satisfiable formula F. Hence, we can state: 

Theorem 1. Given a system S, with Sort(S) C L, a finite set of typed messages 
4> and an initial message m : T then is decidable if3X^ with Sort(X) C L s.t. 
S\\,X^ h 3y : (m : T) G 
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5 Authentication properties 

The theory we have presented in the previous sections deals in particular with 
so called secrecy properties, namely that certain pieces of information remain 
enclosed in a particular context. Among other interesting properties are the 
authentication ones. The definition of authentication used in [10] can be restated 
as follows: 

Whenever a Sender A completes a run of the protocol, apparently with 
Receiver B, then B has recently been running a protocol, apparently 
with A. 

We define two distinct actions start, finish to model the starting of Sender and 
the termination of Receiver respectively: when Sender starts it issues the action 
start and when Receiver terminates it issues the action finish. It is assumed 
that such actions cannot be performed by others than Sender and Receiver. In 
this setting it is possible to formalize the authentication property as follows (see 
[ 10 ]): 

<P = for any run 7 {finish G 7 is start G 7 (,^4) 

Please note that, since the set of runs is prefix closed, this property also implies 
that start precedes finish in any 7. 

It should appear clear that by adapting the compositional analysis techniques 
of previous section, this property could be easily checked; here we prefer to show 
a reduction of the verification of this property to a particular secrecy property, 
that can be directly handled by our theory (and so by our tool). We believe that 
this is an interesting result of its own, since to our knowledge this is the first 
attempt to perform a similar reduction. 

We define an encoding S over systems as: 

S{A\\B) = {A'\\B') 

A' = A[start := c\startfi\ |j c'l{y) : special .N il 
B' = B[finish := c'\finishfi\ |j cl{x) : special. Nil 

where c,c' are channels not occurring in A j| R and starts, finish^ are distin- 
guished values. Moreover we assume: 

— The intruder cannot interact over channel c, c' . This seems reasonable since 
these actions appear only for checking purpose; please note that this hypoth- 
esis matches the hypothesis that start and finish actions cannot be executed 
by the intruder. 

— Values starty and finish^ are basic values such that starts, finish^ ^ fix 
and starty G fix \ fis, finishyGfis \ fi>A- 

Over the system^ 5(A j| B) we will consider the following property: 

= for any run 7 {finishy G Ka.^j starts G Ks.-fi). 

® Actually, our tool is able to deal also with agents that are built by using parallel 
composition. 
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where KA,-y (Kb.-j) represents the knowledge of the agent A (B) after the system 
has performed the sequence 7 . Under the above assumptions we can state: 

Proposition 2. If L is such that Sort{{A || B) |j X) \ {start, finish} C L then: 

((A \\B)\\X)\L\=I> iff (5(A II B)\\X)\L^ S'. 

We can use our tool to check if a system satisfies by checking if it does not 
satisfy: 

: finishy € KA,-y A starts ^ 

Moreover if we force the intruder to eavesdrop any message sent over channels c 
and c' , we can check the property above simply by inspecting the knowledge of 
the intruder i.e. checking the following property: 

: finishy G Kx,-y A starty ^ Kx,-y 

6 Technical framework for the implementation 

In this section we show the more interesting theoretical aspects of our tool. 
In order to implement the partial evaluation function we have to specify how 
succ, Rsucc and the membership of messages in T>{4>) can be computed. We 
define the size of a message |m : T| as 1 if T is a basic type, and as 1 + 
max{|Ti|}ig{i_..._<j^(j-)} if T = Fj(Ti,... ,Tar(j))- Let Msgs{\T\) be the set of 
messages whose type has a size equal or smaller than T. 

We consider as deduction system the one presented in figure 1, that can 
be proved to enjoy our assumptions (see [13]). In the following we define a 
canonical representation of the knowledge of agents with the aim to compute 
easily T>{4>) H Msgs{T) and m : T G T>{4>) (a similar representation, but for a 
different problem has been presented in [7]). 

Definition 1. <j) is downward closed (DC) iff Vm : T G D{4>) \ 4> we have 
m-.T G Msgs{\T\ — 1)). 

It is not difficult to prove that with (p DC we have m : T G D{4>), iff there is 
a proof oi m T that uses only growing rules (namely rules in which the size 
of the conclusion is bigger than the sizes of the premises). Hence to decide if 
m : T G D{4>) it is enough to follow recursively the structure of the message, 
checking if submessages oi m:T belong to (p. Also to compute T>{(p) n Msgs{T) 
we simply follow the structure of the messages, getting for basic types Th the list 
<p n Msgs{Tf), and then correctly reconstruct an appropriate list of messages. 

Definition 2. (p is minimal iff Vm : T G (p we have m : T ^ D{(p H Msgs{\T\ — 
!))• 



Definition 3. <p is a base for B if D{<p) = B and p is minimal and downward 
closed. 
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The property of minimality ensures that no unnecessary message belongs to the 
base, indeed if to : T can be deduced by (j) which is DC then it can be deduced by 
other messages in <p, say toi : Ti, . . . , to„ : with \Ti\ < |T| for z G {1, . . . , n} 

and so TO : r G V{4)\{m : T}), with 4>\{m : T} DC and T>{4)) = T>{(j >\ {to : T}). 
Moreover this representation enjoys the following strong property: 

Proposition 3. Given D{(j)) with (j) base, if tp is a base for D{(j)) then we have 
ip = (j). 

The last thing we have to specify is how a base (p can be updated to a base (p' 
in such a way to have D{(p') = T>{(p U (to : T})- Given (p base for D{(p) then we 
define Add{m : T, (p) such that T>{Add{m : T, (p)) = T>{(p U (to ■ T}): 

Addpm : T,(p) = 

|toi : Ti, . . . , to„ : Tn} = Decomposepm : T, <p) 

(pQ = (p\J {m-.T} 

for i=l to n do = Add{mi : Ti, (pi-i) 

Include{m :T,(pn — {to : T}) 

where: 

— Decompose{m : T, <p) is the set of messages that can be derived from m : T; 
more precisely we compute Decomposepm : T, (p) as the set of messages 
that can be deduced starting from message to : T and applying exactly one 
“destructor” rule (projection rule for pairs and decryption rule for encryp- 
tions). In this way we inductively consider smaller and smaller messages to 
be inserted in (p until undecomposable messages are reached; 

— Include{m : T,(p) is the minimal (p' Q U {to : T}) such that D{<p') = 
V{(p U {to : T}). To obtain Include{m : T,(p) we take advantage of the 
fact that all the relevant submessages of to : T have already been included 
in <p, so we simply need to remove from (p messages directly derivable from 
TO : T. What we do is to remove from (p all the messages that can be deduced 
starting from to : T and applying “constructor” rules (pair formation and 
encryption) . 

Proposition 4. We have that Add{m : T, (p) is a base for T>{(p U {to ■ T}). 

From a practical point of view, our work permits the so called on the fly analysis 
technique, i.e. if there are some errors, these can be found even without the 
explicit analysis of the whole system. 

7 Optimizations 

In this section we try to highlight some further optimizations for our analysis 
of protocols. We have already seen that the formula produced by the partial 
evaluation function can be reduced to a finitary one, still preserving satisfiability. 
Here we present other reductions on the formulas that can improve the efficiency 
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of the verification method. From the definition it follows that the deduction 
functions are monotonic, i.e. if C (/)' then I?(^) C T>{(j>'). This leads to the 
following fact, if C then: 

S\\,X^ h 37 : (m : T) G ^ S\\,X^, h 37 : (m : T) G 

The above implication follows from the fact that if X,f, then ei too. 

So an intuitive approach could be to consider the behaviours of an intruder 
where his knowledge grows as much as possible. This idea has been exploited 
by many researchers (see [18]), in particular Shmatikov and Stern claim that 
they first proved the soundness of this approach. Their model differs from ours, 
since it is based on asynchronous communications. Here we transpose their idea 
in our formal context and we show also the soundness of our reduction, that 
can be stated more easily due to our logical characterization. If we look at the 
partial evaluation function we note that there are two possible behaviours for an 
intruder w.r.t. a system that can perform a communication of a message m : T 
on a channel c (i.e. a Tc^m-.T action), he can wait otherwise he can eavesdrop 
the communication. It is clear that by idling he looses the possibility to increase 
his knowledge, and if an intruder can derive m' : T' by starting from (j) then he 
could derive it by starting from (j) U {m '■ T}. So we have that if: 

(37 : {m' \T') G is satisfiable 

then 

(Tc,m:T) (37 : {m' '.T') G is satisfiable. 

Since we consider disjunctive formulas, we can safely cut off the part of 
the partially evaluated formula, which is obtained trough the analysis of idling 
behaviour of the intruder with respect to a communication action, since if this 
formula is satisfiable then the formula corresponding to the eaves-dropping of 
this communication is satisfiable. 

Another suggestion is to avoid the intruder sending a message if an honest 
participant in the protocol can do that. We can formally state this reduction in 
our formalism as an equivalence of the satisfiability problem between: 

(c?m :T) (dm :T) (37 : (m':T') G and (37 : (m':T') G 

where S S' and m :T G (j). 

It is clear that if (37 : (m' : T') G K'^ ^l/S') is not satisfiable then also 
(c?m : T){dm : T){3^ : (m' :T') G is not satisfiable, otherwise it is 

like to say that (a)(6)F is satisfiable. The other side of the equivalence is similar. 

The partial order reduction techniques can be applied too. In this way it is 
possible to exploit the independence between actions (i.e. performing of one of 
the two actions do not prevent the performing of the other), for example in the 
case that the system has two separate agents that both can perform a sending 
action. Hence, we can prove that: 

(c?m : T)(d?m' : T')(37 : (m":T") G Xf. ,^jlS') V 
(d?m' : T')(c?m : T)(37 : (m":T") G K^JIS') 
is equivalent (from the satisfiability point of view) to 
(c?m : T)(d?m' : T'){3^ : (m":T") G 
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8 Needham Schroeder Public Key protocol 

In this section we show an example protocol that has became paradigmatic for 
testing tools for cryptographic protocol analysis. For a long time it has been 
considered correct, and also proved within a logical framework. It has a simple 
flaw, that arises when the system is considered in presence of another agent. 
Below we present the intended execution between a sender and a receiver, by 
using the notation used in literature. 



A 


B 


{A^a, A}pbii^B) 


A^ B 


{A^a, A}pbii^B) 


B 


^ A 


{Na, Nb}pK(A) 


B^ A 


[Na, Nb, A}pk(A) 


A 


B 


[Nb}pK{B) 


A^ B 


[Nb}pK{B) 




flawed version 


corrected version 



In the flawed version the sender A communicates to B a, fresh nonce Na and its 
name encrypted with the public key of B (so only B, who knows the private key, 
can decrypt this message). Then the receiver B communicates to A the nonce 
Na that he has received before and a fresh nonce encrypted with the public 
key of A. Finally the sender communicates to the receiver the nonce Nh- In the 
intention of the designer of the protocol, at the end of a run between a sender A 
and a receiver B, it must be that only A and B know Na and Ni, (these nonces 
can be used to establish a new communication with a new shared key that is 
function of these values). 

Our specification is based on the description of the behaviour of the two 
components separately. We have tested our specification and as expected we 
have found a flaw, even if a slight different w.r.t. the one presented in [10]. An 
intruder is able to know the nonce Nh- To perform the verification we have only 
specified the initial knowledge of a possible intruder, i.e. the public keys of A and 
B, the names of A and B, and his private and public keys. We do not need to 
give the nonces to the agents, since contrary to other approaches, our framework 
allows the intruder to guess them autonomously. 

The following is a behaviour of an intruder that causes to be leaked (we 
use X{A) to represent the intruder that takes part to a communication as the 
agent A): 

A^X : E{Xkey,{Na,A)) 

X{A)^ B : E{Bkey,{Na,A)) 

B^X{A) : E{Akey,{Na,Nb)) 

X^A : E{Akey,{Na,Nb)) 

X{A)^ B-.E{Akey,Na) 

A A : E{Xkey, Nb) 

The attack performed can be summarized as follows: the agent A starts a 
run of the protocol with the agent X; then the agent X can simulate A in a 
run of the protocol with the agent B. The agent B sends to A (A) the mes- 
sage E{Akey, {Na,Nb)), which contains the fresh nonce Nb, encrypted with A 
public key. Now the intruder is not able to decrypt directly the message, but 
he can send the message to the agent A. The agent A will correctly decrypt 
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E{Akey,{Na,Nh)) and then he reply the nonce Nt to X, encrypted with X 
public key, since he thinks that is the second message of his run with X. Now 
X knows -/Vfe! It is interesting to note that the above intruder is not very clever, 
since he sends to B, as last message, a non correct message (encrypted with 
a wrong key), by permitting B to understand that there are some problems. A 
clever intruder can wait to receive the correct message from A, and then send Nh 
back to B, correctly encrypted. Also this intruder can be found by our secrecy 
analysis. By performing authentication analysis only the latter intruder can be 
found. Indeed, we have corrected the protocol similarly to [10] and we have veri- 
fied that there are no flaws. The presented attack is found in few seconds by our 
tool, and the verification of the corrected version takes less than a minute on a 
Pentium PC, with Linux operating system. It is interesting to note that we do 
not need to introduce a specification for an intruder. 



9 Conclusions and related work 



We have proposed a tool for checking security properties of cryptographic proto- 
cols. The underlying methodology is a transposition of the ideas proposed for the 
analysis of information flow security properties (i.e. non interference, see [13]). 
Hence, compositional analysis techniques seem to provide a unique conceptual 
framework for the study of security properties. 

To our knowledge, the only previous attempt to analyze non interference 
and authentication protocols within the same conceptual framework, has been 
proposed by Focardi and Gorrieri ([4,6]), but in their work an explicit description 
for a particular (the most general) intruder is requested. 

The same limitation is present in the seminal work of Lowe [10], who applies 
generic tools for verification of process algebra terms for the analysis of authen- 
tication protocols. In the aforementioned paper, Lowe shows how by starting 
from the results of the analysis on a finite number of runs, one can deduce the 
correctness of the whole behaviour of the protocol. 

Perhaps, a work more similar to ours is the one of Marrero et al. (see [12]), 
where a model with sequential agents is used and the explicit description of an 
intruder is not needed since an axiomatic behaviour of the intruder is supposed. 
But the work is more limited in its scopes, since they do not permit the intruder 
to guess nonces and uses two different methodology for secrecy and authenti- 
cation, and seems not to be directly generalizable (they make reasoning with a 
fixed theory), while generality and flexibility are major topics of our work. Ac- 
tually, our approach is from the opposite direction, the behaviour of an intruder 
is automatically considered when one applies the point of view of compositional 
analysis. 

Another interesting approach to model intruders, is the one of Abadi and 
Gordon (see [1]). They use the testing equivalence theory for a variant of the 
TT— calculus to take in account the presence of unspecified intruders that try to 
leak some information. The idea is very appealing, but their approach differs from 
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ours since it relies mainly on proof techniques, while our approach is tailored for 
automatic analysis. 
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Abstract. We investigate techniques for verifying hierarchical systems, 
i.e., finite state systems with a nesting capability. The straightforward 
way of analysing a hierarchical system is to first flatten it into an equiv- 
alent non-hierarchical system and then apply existing finite state sys- 
tem verification techniques. Though conceptually simple, flattening is 
severely punished by the hierarchical depth of a system. To alleviate this 
problem, we develop a technique that exploits the hierarchical structure 
to reuse earlier reachability checks of superstates to conclude reacha- 
bility of substates. We combine the reusability technique with the suc- 
cessful compositional technique of [13] and investigate the combination 
experimentally on industrial systems and hierarchical systems generated 
according to our expectations to real systems. The experimental results 
are very encouraging: whereas a flattening approach degrades in perfor- 
mance with an increase in the hierarchical depth (even when applying 
the technique of [13]), the new approach proves not only insensitive to 
the hierarchical depth, but even leads to improved performance as the 
depth increases. 



1 Introduction 

Finite state machines provide a convenient model for describing the control-part 
(in contrast to the data-part) of embedded reactive systems including smaller 
systems such as cellular phones, hi-fi equipment, cruise controls for cars, and 
large systems as train simulators, flight control systems, telephone and communi- 
cation protocols. We consider a version of finite state machines called state/event 
machines (SEMs). The SEM model offers the designer a number of advantages 
including automatic generation of efficient and compact code and a platform for 
formal analysis such as model-checking. In this paper we focus and contribute 
to the latter. 

In practice, to describe complex systems using SEMs, a number of exten- 
sions are often useful. In particular, rather than modeling a complex control 
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(a) Hierarchical system. 



(b) Flattened system. 



Fig. 1. (a) A hierarchical model of a toy train. The system is composed of a 
number of serial, parallel and primitive states, (b) The model after it has been 
flattened. 



as a single SEM, it is often more convenient to use a concurrent composition 
of several component SEMs each typically dealing with a specific aspect of the 
control. Here we focus on an additional hierarchical extension of SEMs, in which 
states of component SEMs are either primitive or superstates which are them- 
selves (compositions of) SEMs. Figure 1(a) illustrates a hierarchical description 
of a system with two components, a Train and a Crossing. Inside the Train the 
state Move is a superstate with the two (primitive) states Left and Right. Transi- 
tions within one component may be guarded with conditions on the substates of 
other components. E.g., the ‘Go’-transition may only be fired when the machine 
Crossing is in the substate Closed. 

The Statechart notation is the pioneer in hierarchical descriptions. Intro- 
duced in 1987 by David Harel [10] it has quickly been accepted as a compact 
and practical notation for reactive systems, as witnessed by a number of hierar- 
chical specification formalisms such as Modecharts [11] and Rsml [12]. Also, 
hierarchical descriptions play a central role in recent object-oriented software 
methodologies (e.g., Omt [15] and Room [16]) most clearly demonstrated by 
the emerging UML-standard [8]. Finally, hierarchical notations are supported by 
a number of CASE tools, such as Statemate [2], ObjecTime [3], Rational- 
Rose [4], and in the forthcoming visualSTATE™version 4.0 [1]. 

Our work has been performed in a context focusing on the commercial prod- 
uct visualSTATE™ and its hierarchical extension. This tool assists in developing 
embedded reactive software by allowing the designer to construct and manipulate 
SEM models. The tool is used to simulate the model, checking the consistency 
of the model, and from the model automatically generate code for the hardware 
of the embedded system. The consistency checker of visualSTATE™ is in fact a 
verihcation tool performing a number of generic checks, which when violated in- 
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dicate likely design errors. The checks include checking for absence of deadlocks, 
checking that all transitions may fire in some execution, and similarly checking 
that all states can be entered. 

In the presence of concurrency, SEM models may describe extremely large 
statespaces^ and, unlike in traditional model checking, the number of checks to 
be performed by visualSTATE™ is at least linear in the size of the model. In 
this setting, our previous work [13] offers impressive results: a number of large 
SEM models from industrial applications have been verified. Even a model with 
1421 concurrent SEMs (and 10^^® states) has been verified with modest resources 
(less than 20 minutes on a standard PC) . The technique underlying these results 
utilises the ROBDD data structure [9] in a compositional analysis which initially 
considers only a few component-machines in determining satisfaction of the ver- 
ification task and, if necessary, gradually includes more component-machines. 

Now facing hierarchical SEMs, one can obtain an equivalent concurrent com- 
position of ordinary SEMs by flattening it, that is, by recursively introducing 
for each superstate its associated SEM as a concurrent component. Figure 1(b) 
shows the flattening of the hierarchical SEM in Fig. 1(a) where the superstate 
Move has given rise to a new component mMove. Thus, verification of hierarchical 
systems may be carried out using a flattening preprocessing. E.g., demonstrating 
that the primitive state Left is reachable in the hierarchical version (Figure 1(a)), 
amounts to showing that the flattened version (Figure 1(b)) may be brought into 
a system-state, where the m Move-component and the mTra in-component are si- 
multaneously in the states Left and Move. 

Though conceptually simple, verification of hierarchical systems via flatten- 
ing is, as we will argue below (Section 2) and later experimentally demonstrate, 
severely punished by the hierarchical depth of a system; even when combined 
with our successful compositional technique of [13] for ordinary SEMs. 

To alleviate this problem, we introduce in this paper a new verification tech- 
nique that uses the hierarchical structure to reuse earlier reachability checks 
of superstates to conclude reachability of substates. We develop the reusability 
technique for a hierarchical SEM model inspired by Statechart and combine 
it with the compositionality technique of [13]. We investigate the combination 
experimentally on hierarchical systems generated according to our expectations 
from real systems.^ The experimental results are very encouraging: whereas the 
flattening approach degrades in performance with an increase in the hierarchical 
depth, it is clearly demonstrated that our new approach is not only insensitive 
to the hierarchical depth, but even leads to improved performance as the depth 
increases. In addition, for non-hierarchical (flat) systems the new method is an 
instantiation of, and performs as well as, the compositional technique of [13]. 



^ The so-called state-explosion problem. 

^ In short, we expect that transitions and dependencies between parts of a well- 
designed hierarchical system are more likely to occur between parts close to each 
other rather than far from each other in the hierarchy. 
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Related Work 

R. Alur and M. Yannakakis’ work on hierarchical Kripke structures offers im- 
portant worst case complexity results for both LTL and CTL model checking [5] . 
However, their results are restricted to sequential hierarchical machines and use 
the fact that abstract superstates may appear in several instantiations. In con- 
trast we provide verification results for general hierarchical systems with both 
sequential and parallel superstates without depending on multiple instantiations 
of abstract superstates. 

Park, Skakkebaek and Dill [14] have found an algorithm for automatic gen- 
eration of invariants for states in Rsml specifications. Using these invariants it 
is possible to perform some of the same checks that we provide for hierarchical 
SEMs. Their algorithm works on an approximation of the specification, and uses 
the fact that Rsml does allow internal events sent from one state to another. 

2 Flattening and Reusability 

To see why the simple flattening approach is vulnerable to the hierarchical depth, 
consider the (schematic) hierarchical system of Fig. 2(a). The fiattened version 
of this system will contain (at least) a concurrent component mSi for each of 
the superstates S'i for 0 < z < 100. Assume, that we want to check that the state 
u is reachable. As reachability of a state in a hierarchical system automatically 
implies reachability of all its superstates, we must demonstrate that the fiattened 
system can reach a state satisfying the following condition:^ 



mSioo@u A mSgg@Sioo A mSgs@Sgg A ... A mSo@Si . 



Consequently, we are faced with a reachability question immediately involving a 
large number of component SEMs, which in turn means that poor performance 
of our compositional technique [13] is to be expected. Even worse, realizing all 
the checks of visualSTATE™ means that we must in similarly costly manners 
demonstrate reachability of the states x, y, z and v. All these checks contain 
mSgg@Sioo A mSgs@Sgg A ... A mSo@Si as common part. Hence, we are in fact 
repeatedly establishing reachability of ^loo as part of checking reachability of 
x,y,z,u and v. As this situation may occur at all (100) levels, the consequence 
may be an exponential explosion of our verification effort. 

Let us instead try to involve the hierarchical structure more actively and 
assume that we have already in some previous check demonstrated that S'loo is 
reachable (maybe from an analysis of a more abstract version of the model in 
which S'loo was in fact a primitive state) . 

How can we reuse this fact to simplify reachability-checking of, say, u? As- 
sume first a simple setting (Figure 2(a)), where Sioo is only activated by transi- 
tions to Sioo itself (and not to substates within Sioo) and transitions in Sioo are 
only dependent (indicated by the guard g) on substates within Sioo itself. In this 

Here mS@T denotes that the component mS is in state T. 
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Fig. 2. Simple and complex substates. 



case we may settle the reachability question by simply analysing S'loo as a system 
of its own. In more complex situations (Figure 2(b)), S'loo may possibly be acti- 
vated in several ways, including via transitions into some of its substates. Also, 
the transitions within Sioo may refer to states outside Sioo (indicated by the 
guard g*). In such cases — in analogy with our previous compositional technique 
[13] — we compute the set of states which regardless of behaviour outside S'loo 
may reach u. If this set contains all potential initial states of Sioo (m Fig. 2(b) 
the states x, y, u) we may infer from the known reachability of Sioo that also 
u is reachable. Otherwise, we will simply extend the collection of superstates 
considered depending on the guards within Sioo and the transitions to Sioo- 
In the obvious way, transitions between (super)states and their guards de- 
termine the pattern of dependencies between states in a hierarchical system. We 
believe that in good hierarchical designs, dependencies are more likely to exist 
between states close to each other in the hierarchy rather than states hierarchi- 
cally far from each other. Thus, the simple scenario depicted in Fig. 2(a) should 
in many cases be encountered with only small extensions of the considered su- 
perstates. 

3 The Hierarchical State/Event Model 

A hierarchical state/event machine (HSEM) is a hierarchical automaton consist- 
ing of a number of nested primitive, serial, and parallel states. Transitions can 
be performed between any two states regardless of their type and level, and are 
labeled with an event, a guard, and a multiset of outputs. Formally an HSEM 
is a 7-tuple 

M = {S, E, O, T, Sub, type, def) (1) 

of states S, events E, outputs O, transitions T, a function Sub : S 'P{S) asso- 
ciating states with their substates, a function type : S — > {pr,se,pa} mapping 
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states to their type (indicating whether a state is primitive, serial, or parallel), 
and a partial function def : S ^ S mapping serial states to their default sub- 
state. The set of serial states in S is referred to as R. 

The set of transitions TCSxExGx M{0) x S where M{0) is the set 
of all multisets of outputs, and G is the set of guards derived from the grammar 
(/ ::= A (/2 I I tt | s. The atomic predicate s is a state synchronisation on the 
state s, having the intuitive interpretation that s is true whenever s is active (we 
will return to the formal semantics in a moment). We use t = (st, et, gt, Ot, s() to 
range over syntactic transitions (with source, event, guard, outputs and target 
respectively) . 

For notational convenience we write s \ whenever s' € Sub{s). Further- 
more we define X"*" to be the transitive closure, and \* to be the transitive and 
reflexive closure of \. If s X’*' s' we say that s is above s' , and s' is below s. 
The graph (S', X) is required to be a tree, where the leaves and only the leaves 
are primitive states, i.e.. Vs : type{s) = pr <G> Sub{s) = 0. 

For a set of states I, lca(/) denotes the least common ancestor of / with 
respect to X- For a state s, Isa(s) denotes the least serial ancestor of s. The 
scope of a transition t is denoted x(t) and represents the least common serial 
ancestor of the states St and s(. For those transitions in which such a state does 
not exist, we say that y(t) = $, where $ is a dummy state above all other states, 
i.e., ys e S :$\+ s. 

A configuration of an HSEM is an |i?|-tuple of states indexed by the serial 
states. The configuration space S of an HSEM is the product of the set of 
substates of each serial state. 



A = J]^5u6(s) . (2) 

seR 

The projection tt^, : A — > Sub{s) of a configuration cr onto a serial state s yields 
the value of s in cr. The projection of a configuration onto a parallel or primitive 
state is undefined. A state s is active in a if either s is the root state, the parent 
of s is an active parallel state, or the parent is an active serial state and s is 
the projection of a onto the parent. In order to formalise this we define the infix 
operator in as 



s in cr <G> Vs' X"*" s : s' G R ^ TTs' (a) \* s . (3) 

We denote by Eg = {cr | s in cr} the set of conflgurations in which s is active. 

Let <J \= g whenever a satisfies g. The interpretation of a guard is defined as: 
cr ^ tt (any configuration satisfies the true guard), cr |= s iff s in cr, cr |= (/i A 32 
iff cr ^ (/I and a ^ g 2 , and a \= ^g iff a g. A pair (e,cr) is said to enable a 
transition t, written (e, cr) |= t, iff e = et, St in a, and cr ^ gt- 

Before introducing the formal semantics, we summarise the intuitive idea 
behind a computation step in HSEM. An HSEM is event driven, i.e., it only 
reacts when an event is received from the environment. When this happens, a 
maximal set of non-conflicting and enabled transitions is executed, where non- 
conflicting means no transitions in the set have nested scope. This conforms 
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to the idea that the scope defines the area affected by the transition. When 
a transition is executed, it forces a state change to the target. All implicitly 
activated serial states enter their default state. In fact, a transition is understood 
to leave the scope and immediately reactivate it. 

Formally, a set Z\ C T is enabled on (e, cr) if Vt G Z\ : (e, cr) \= t, A is 
compatible if Vt , £ A : {t ^ t' ^ x(t) X* x(fO)> ^ is maximal if 
\/A' C T : A C A' ^ A' is incompatible or disabled on (e,a). The semantics of 
an HSEM i^ defined in terms of a transition relation — >C E x E x Ai(0) x E. 
We have cr— if and only if there exists a set Z\ G T, such that:"^ 

1. A is compatible, enabled on (e,cr), and maximal, 

2. o = 

3. yt £ A : Sf in cr', 

4. Vt G A, s G S' : s in cr' A type{s) = se A x(t) \+ s X"*" s* = def{s), 

and 

5. \/s £ R : £ A : x(t) X* s) ^ ’’■^(cr) = 7Ts(cr'). 

The second constraint defines the output of the transition, the third that all 
targets are active after the transition, the fourth that all implicitly activated 
serial states (those not on the path between the scope and the target of any 
transition) are recursively set to their default state, and the last that all states 
not under the scope of any transition remain unchanged. 



4 Reusable Reachability Checking 

The consistency checker of visualSTATE™ performs seven predefined types of 
checks, each of which can be reduced to verifying one of two types of properties. 
The first property type is reachability. For instance, visualSTATE™ checks for 
absence of dead code in the sense that all transitions must be possibly enabled 
and all states must be possibly entered. E.g., checking whether a transition t will 
ever become executable is equivalent to checking whether its guard is satisfiable, 
i.e., whether we can reach a configuration a such that 3e : (e,a) ^ t. Similarly, 
checking whether a state s may be entered amounts to checking whether the 
system can reach a configuration within Eg. 

The remaining two types of consistency checks reduce to a check for absence 
of local deadlocks. A local deadlock occurs if the system can reach a configuration 
in which one of the superstates will never change value nor be deactivated no 
matter what sequence of events is offered. 

In the following two sections we present our novel technique exploiting reus- 
ability and compositionality through its application to reachability analysis only. 
In the full version of this paper [7] and in [6] the applicability of the technique 
to local deadlock detection is given in detail. 

In general, a reachability question involves a set of goal configurations X C E. 
The question posed is whether X is reachable in the sense that there exists a 

^ The symbol UJ denotes multiset union 
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X := {(T I (T is a goal conf.} 
while Init{i) ^ X and ao ^ X do 
begin 

X' ■- Bi{X)uX 

if X / X' then 

X := X' 

else if Init{i) n X 7 ^ 0 then 
i := Isa(i) 
else 

return false 

end 

return true 



(b) Algorithm 1. 



Fig. 3. Reusable reachability check. 



sequence of events such that the system starting at the initial configuration (Jq 
enters a configuration in X. To explain the idea of reusability, let z be a state such 
that X C Xi, i.e., reachability of any configuration within X implies reachability 
of the state i (see Fig. 3(a)). Notice that such a state always exists, e.g., the root 
will satisfy this condition for any X yf X. Also, if X = any superstate of s will 
suffice. The question we ask is how existing information about reachability of i 
may be reused to simplify reachability-checking of X. The simple case is clearly 
when i is not reachable. In this case there is no way that X can be reachable 
either, since X only contains configurations where i is active. Since we expect 
(or hope) most of the reachability questions issued by visualSTATE™ to be true 
this only superficially reduces the number of computations. However, although 
more challenging, we can also make use of the information that i is reachable, 
as explained below. 

Knowing i is reachable, still leaves open which of the configurations in X^ 
are in fact reachable (and in particular if any configuration in X is). However, 
any reachable configuration a in X^ must necessarily be reachable through a 
sequence of the following form:® 

do CTi dn+l dn+2 ■ <Jn+k <J . (4) 

eSi 

Let the initial configurations for i, Init(i), be the configurations for which i is 
active and which are reachable in one step from a configuration in which i is 
inactive (e.g., the configuration (j„+i in (4); see also Fig. 3(a)). Algorithmically, 
(an over approximation of) Init{i) may be obtained effectively in a straightfor- 
ward manner directly from the syntactic transitions. Consider then the following 

® Here a a' abbreviates 3e, o : a a' . 
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backwards step computation: 

Bi{Y) = {a G Si \ 3 a' : a a' A a' G Y} (5) 

that is, Bi(Y) is the set of configurations with i active, which in one step may 
reach Y. To settle reachability of X, we iteratively apply Bi according to Algo- 
rithm 1 in Fig. 3(b). Reachability of X may now be confirmed if either the initial 
configuration of the system is encountered (cto S A) or the backwards iteration 
reaches a stage with all initial states for i included {Init{i) C A). Dually, if 
the backwards iteration reaches a fixed point (A* = Bi{X*)), reachability of 
A can be rejected if no initial configuration for i has been encountered (i.e., 
A* n Init{i) = 0.). If some but not all of the initial configurations for i have 
been encountered, the analysis does not allow us to conclude on the reachability 
of A based on reachability of i. Instead, the backwards iteration is continued 
with i substituted with its directly enclosing, serial superstate. 

The reusability approach depends on a previous reachability check of the 
non-primitive states in the system. Since this is itself a series of reachability 
checks the above approach can be applied immediately if we perform a preorder 
traversal of the state tree determining reachability of each state as we encounter 
them, reusing the previous checks. If a state turns out to be unreachable we can 
immediately conclude that all substates are unreachable. 

5 Compositional Reachability Checking 

The reusable reachability analysis offered by the algorithm of Fig. 3(b) is based 
on the backward step function Bi. An obvious drawback is that computation of 
Bi requires access to the global transition relation — In this section we show 
how to incorporate the compositional technique of [13] by replacing the use of 
Bi with a backwards step function, CBi, which only requires partial knowledge 
about the transition relation corresponding to a selected and minimal subsys- 
tem. The selection is determined by a sort I identifying the set of superstates 
currently considered. Initially, the sort / only includes superstates directly rel- 
evant for the reachability question. Later, also superstates on which the initial 
sort behaviourally depend will be included. 

A subset I oi R (the set of serial states) is called a sort if it is non-empty, and 
is convex in the sense that u G I whenever lca(J) \* u \* y for some y G /.® 
For any nonempty set A C R the set C onvex(A) denotes the minimal superset 
of A satisfying the properties for a sort. The state lca(/) of a sort will turn out 
to be an ideal choice for the state i used in the reusable reachability algorithm 
in the previous section. 

Two configurations a and a' are said to be I -equivalent, written cr =/ a' , 
whenever they agree on all states in I. More formally 

a =/ a' tJs G I : Ti.s{a) = 7r.s(a') . (6) 



Only if lca(7) is a serial state does this imply that lca(7) G 7. 
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For notational convenience we write Si = • A set P C 27/ of configurations 

is said to be /-sorted in case 



Vcr, a' € Sj : a =i a' ^ {a G P a' € P) . (7) 

Notice that we require that P C Sj for P to be /-sorted. This follows from the 
idea that the reusable reachability check restricts the analysis to the subsystem 
with root lca(/). P being /-sorted intuitively means that it only depends on 
states within I. Using ROBDDs allows for very compact representations of I- 
sorted sets as the parts of the configuration set outside the sort may be ignored. 

From an /-sorted set X we perform within 27/ a compositional backwards 
computation step by including all configurations with lca(/) active which, irre- 
spective of the behaviour of the superstates outside I, can reach X. One back- 
ward step is given by the function CBj defined by: 

CBi{X) = {(T G 27/ I Vcr' G 27/ : cr =/ cr' ^ 3cr" G 2/ : cr' cr"} . (8) 

Observe that CBj is monotonic in both X and I . By iterating the application 
of CBi, we can compute the set of configurations that are able to reach a con- 
figuration within X independently of behaviours outside the considered sort I. 
This is the minimum fixed-point pY.X UCBi{Y) which we refer to as CB’^{X). 
In an ROBDD based implementation, the global transition relation may be par- 
titioned into conjunctive parts with contributions from each superstate. Crucial 
for our approach is the fact that CBj may be computed without involving the 
global transition relation directly, but only the parts of the partitioning relevant 
for the considered sort I. We refer to [13] for a similar observation for flat SEMs. 

If computing CB’^(X) does not resolve the reachability question, we extend 
the sort / with the states Dep{I) (see Fig. 4) that the behaviour of / depends 
on. Now, extending Dep to sets in the obvious pointwise manner, we say that a 
sort I is dependency closed provided Dep{I) C I. The basic properties of C/?! 
are captured by the following lemma: 

Lemma 1. Let X he an I-sorted subset of 27. For all sorts /, J with I Q J the 
following holds: 

1. CB*j{X) C CB*j{X), 

2. CB*jlx) = CB*j{CB*{X)), 

3. I dependency closed A Init{I) n CBj{X) = % => CBj{X) = CBj{X). 

The first property guarantees that we may conclude X reachable as soon as 
all initial configurations of some known reachable state is encountered (say the 
global initial state). The second property allows us to reuse backwards compu- 
tations performed with one sort as the starting point for a larger sort. The last 
property allows reachability of X to be rejected in case / is dependency closed 
and no initial configuration of / has been encountered (as no new configurations 
will be encountered by extending the sort). 

Algorithm 2 in Fig. 5 is the result of using the compositional backward step 
CBj instead of /?/, with Minsort(X) offering a minimal sort for the set of 
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configurations X . When the algorithm returns false, none of the configurations in 
X are reachable. If true is returned, it means that at least one goal configuration 
is reachable under the assumption that lca(/) is known to be reachable. 




Fig. 4. State c depends on u, due to the transition from e to u and since u is the 
parent of v upon which the transition is guarded. Likewise does the transition 
from e to b create dependencies from state a (the scope of the transition) to 
state c (the parent of the source) and x (the parent of the state upon which the 
transition is guarded). 



I := Minsort(X) 

while ^ X and ao ^ X do 

begin 

X' ■- CBi{X)uX 
if X ytX' then 
X := X' 

else if Dep{I) ^ / then 
I ~ Convex{I U Dep{I)) 
else if Init{I) n X 7 ^ 0 then 
/ := / U {lsa(lca(/))} 
else 

return false 

end 

return true 

Fig. 5. Algorithm 2, reusable and compositional reachability. 



6 Experimental Results 

To evaluate our approach, the runtime and memory usage of an experimental 
implementation using our method is compared to an implementation for flat 
systems. We will refer to the first as the hierarchical checker and the second as 
the Hat checker. Both checkers utilise the compositional backwards analysis and 
use ROBDDs to represent sets of states and transition relations, but only the 
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hierarchical checker uses the reusable reachability check. Only satisfiability of 
transitions is verified, i.e., whether the system for each transition can reach a 
configuration such that the transition is enabled. The hierarchical checker addi- 
tionally checks whether non-primitive states are reachable since this is necessary 
in order to apply the reusable reachability check. 

The two implementations where first compared on fiat test cases previously 
used in [13]. Without going into details, adding the reusable reachability checking 
did not degrade performance. 

The lack of adequate examples has forced us to develop a method to generate 
scalable hierarchical systems. It is possible to scale both the maximum nesting 
depth, the number of substates of parallel and serial states, and the total number 
of serial states (which is equivalent to the number of automata in the fiat system) . 
Serial and parallel states alternate on the path from the root to the leaves starting 
with a parallel state. The number of states are adjusted by pruning the state 
tree, i.e., just because a system has a nesting depth of 12 does not mean, that 
all leaves are placed at level 12 (the size of such a system would be extreme). If 
the generated system is not deep enough to accommodate the number of wanted 
states with the chosen width of parallel and serial states, the width is expanded. 
E.g., a system with 100 serial states and depth 1 will have a parallel root with 
100 substates. 

As stated in the introduction, we believe that in good designs, dependencies 
are more likely to be local. The generated test cases reflect this by only including 
transitions between nearby states. The guards are created at random, but the 
probability that a guard synchronises with a given state is inverse exponential 
to the distance between the scope of the transition and the state. The number of 
transitions is proportional to the number of serial states. Transitions are arranged 
so that any state is potentially reachable, i.e., if the transitions were unguarded 
all states would be reachable. Events are distributed such that the system is 
guaranteed to be deterministic. 

Figure 6 shows the runtime of both the hierarchical and the fiat checker for 
a fixed number of substates in parallel and serial states (4 in parallel and 3 in 
serial), but with varying depth and number of serial states (which corresponds to 
the number of automata in the equivalent fiat system) . It is interesting to notice 
that the runtime of the hierarchical checker is much more consistent than that of 
the fiat checker, i.e., the runtime of the fiat checker does vary greatly for different 
systems generated with the same parameters, as the depth is increased. Although 
each grid point of the figures shows the mean time of 20 measurements,^ it is 
still hard to achieve a smooth mesh for the fiat checker. 

While the fiat checker suffers under the introduction of a hierarchy, the hier- 
archical checker actually benefits from it. How can it be that the addition of a 
hierarchy decreases the runtime of the hierarchical checker? As stated earlier, we 

^ It took about two days to run the 1920 cases providing the basis of the 96 depicted 
grid points. The test was performed on a Snn UltraSparc 2 with two 300 MHz 
processors and 1 GB of RAM (although the enforced limit of 10® nodes assured a 
maximal memory consumption below 20 MB). 
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Fig. 6. Comparison of the runtime of the flat and hierarchical checker, (a) The 
runtime of both checkers is plotted as a function of the nesting depth and number 
of automata/serial states, (b) A slice of the mesh where the number of automata 
is 300. As can be seen, the runtime of the flat checker explodes as the depth 
increases, whereas the runtime of the hierarchical checker decreases slightly. 



believe that a good hierarchical design is modular in its nature. If a particular 
system cannot be easily described using a hierarchy, this is probably due to too 
many interdependencies in the system. Our test cases incorporate this idea: In 
a system with depth one, the distance between any two states in two different 
superstates will be constant. Hence the probability with which a guard refers to 
a state in another superstate is constant, i.e., it is likely that many superstates 
depend on each other. 

It is worth noticing, that our method allows us to drop reachability questions 
which result in an unreachable initial lea state (in this case the answer will be 
no). The number of questions dropped because of this is proportional to the 
number of unreachable states in the test case. This number varies, but is most 
of the time below 5-10% of the total number of checked states (primitive states 
are not checked), although 50% unreachable states have been observed. Testing 
whether the non-primitive states are reachable is very fast compared to the time 
it takes to check the transitions. It is noteworthy that some test cases, even 
without any unreachable states, showed a difference in runtime with a factor of 
over 180 in favor of the hierarchical checker compared to the flat one. 

Table 1 provides further information on the performance of the hierarchical 
checker on a single case with depth 12, 399 serial states, 3 substates in each 
parallel state, and 4 substates in each serial state.® This results in a total of 1596 
transitions, although optimisations did allow the checker to verify 331 transitions 
without performing a reachability analysis, leaving 1265 checks (not counting 
reachability checking of non-primitive states). The table shows the number of 
questions distributed over the initial and final depth of the lea state of the 
questions. For instance we can see that 59 of the questions starting at depth 



This corresponds to a state space of 10^^° configurations 
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5 are verified without including additional states toward the root, but that 2 
questions needed to expand the sort such that the final answer was found at 
depth 3. It is apparent that a large number of questions is verified in terms 
of a small subsystem. This illustrates why our method does scale as well as it 
does. This particular system is verified within 26 seconds using the hierarchical 
checker, whereas the flat checker uses 497 seconds. 



Table 1. Distribution of reachability questions. The vertical axis shows the ini- 
tial distance between the root and the subsystem analysed, and the horizontal 
axis shows the final distance. From the diagonal it can be seen that most ques- 
tions are answered without including additional states toward the root. 
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7 Conclusion 



In this paper we have presented a verification technique for hierarchical systems. 
The technique combines a new idea of reusability of reachability checks with 
a previously demonstrated successful compositional verification technique. The 
experimental results are encouraging: in contrast to a straightforward flatten- 
ing approach the new technique proves not only insensitive to the hierarchical 
depth, but even leads to improved performance as the depth increases (given a 
fixed number of serial states). A topic for further research is how to extend the 
techniques to model-checking of more general temporal properties and how to 
combine it with utilisation of multiple instantiations of abstract superstates. 
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Abstract. We present a new approach for proving safety properties 
of reactive systems, based on tight interaction between static analysis, 
theorem proving and abstraction techniques. The method incrementally 
constructs a proof or finds a counterexample. Every step consists of ap- 
plying one of the techniques and makes constructive use of information 
obtained from failures in previous steps. The amount of user intervention 
is limited and is highly guided by the system at each step. We demon- 
strate the method on three simple examples, and show that by using 
it one can prove more properties than by using each component as a 
stand-alone. 



1 Introduction 

Theorem proving [GM95, ORS-l-95, CCF-|-97]^ is a powerful and general way to 
verify safety properties of reactive systems, but its use in mechanical verification 
requires a serious amount of both insightful and labor-intensive manual guidance 
from the human verifier. Model checking [BCM-l-92, H91, LPY97] is largely 
automatic but it only addresses a limited class of essentially finite-state systems. 
Abstraction [SUM96, DGG97, GS97, BL098, GU98] can be used to translate an 
infinite-state system to a finite-state system so as to preserve the property being 
verified. This can reduce the manual burden of the verification but the discovery 
of a suitable property-preserving abstraction takes considerable human ingenuity. 
Furthermore, when the abstracted system fails verification, this could either be 
because the abstraction was too coarse or because the system did not satisfy the 
property. It takes deep insight to draw useful information from such a failure. 
This paper addresses these problems by presenting a methodology for integrating 
static analysis [GG77, HPR97, BL], theorem proving, and abstraction that does 
not tax the patience and ingenuity of the human verifier. In this methodology 

* This research was supported by National Science Foundation grant CCR-9509931. 
The first author is also supported by a Lavoisier grant of the French Ministry of 
Foreign Affairs. 

^ Due to space limitations, we cite only a few of the relevant contributions in each 
domain. 
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1. The choice of the abstraction mapping can be guided by the subgoals in a 
failed proof attempt. 

2. A failed verification attempt at the abstract level suggests either strength- 
ened invariants or a more refined abstraction. 

3. The iterative process, when it terminates, yields a counterexample indicating 
how the property is violated or a proof that the property is satisfied. 

We also show that the combination of abstraction and theorem proving is strictly 
more powerful than verification based on theorem proving with systematic dy- 
namic invariant strengthening techniques. 

In our method, the verification starts with a one-time use of static analysis, 
generating true-by-construction invariants that are communicated to both the 
theorem-proving and abstraction components. The rest of the process involves 
a tight interaction between the prover and abstraction generator, in which each 
step makes constructive use of information obtained from failures in previous 
steps. The method assists the user in discovering relevant auxiliary invariants 
and suitable abstraction mappings while progressing towards a proof or a coun- 
terexample. Using this “small increments” approach the required amount of user 
ingenuity is reduced. Instead of having to rely on keen insight of the problem 
right from the start, the user gains increasing insight as she progresses in the 
verification task, enabling her to conclude eventually. 

The rest of the paper is organized as follows. In Section 2 we present some 
basic terminology and an overview of the static analysis, theorem proving and 
abstraction techniques that we are using. Section 3 presents our approach for 
integrating these techniques, which we introduce through the verification of a 
simple example. Section 4 contains a formal comparison of the relative power 
of the theorem proving and abstraction techniques, together with an example 
demonstrating that our method is strictly more powerful than using each com- 
ponent as a stand-alone. We conclude in Section 5 and present some future work 
directions. 

2 The Components 

We use transition systems as a computational model for reactive programs. A 
transition system T consists of a finite set of typed variables V, an initial con- 
dition 0 and a finite set of guarded transitions T. The variables can be either 
control or data variables; the control variables are of a finite type Location. Each 
transition r G T is labeled and consists of a guard and an assignment. A state 
is a type-consistent valuation of the variables, and the initial condition 0 is a 
predicate on states. Each transition r induces a transition relation p,- relating 
the possible before and after states. The global transition relation of the system 

is PT = UrerP^- 

A computation of the transition system is an infinite sequence of states, in which 
the first state satisfies the initial condition and every two consecutive states are 
in the transition relation. The parallel (asynchronous) composition of transition 
systems is defined using interleaving in the usual manner. For a transition r 
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and a state predicate ip, the predicate pre^{ip) characterizes all the states from 
which, after taking transition r, the predicate p holds: 

pre^{p): ,pr{s,s) Z) p{s). 

Likewise, post^(p) characterizes the states that can be reached by taking tran- 
sition r from some state satisfying p: 

post^{p): 3s.pr{s,s ) A p{s). 

These predicates are also defined globally for the transition system T : 

pferp(p): \/s .pt{s, s ) D p{s ), postrp(p): 3s.pt{s, s ) A p{s). 

In the sequel, we omit T when it is understood from the context. 

We now briefly describe the static analysis, theorem proving and abstraction 
techniques we integrate in our approach. In should be stressed that the identity 
of the particular tools we use is not the main point here, but rather the way in 
which we integrate them. One could, for example, use Polka [HPR97] as the 
static analysis tool, InVesT [BL098] as the abstraction tool, etc. 



2.1 Static Analysis 

For automatically generating invariants we use a method similar to that sug- 
gested by [BL098]. The analysis starts by computing local invariants at every 
control location: the local invariant of a control location I is the disjunction 
of post^{true), for all transitions r leading to 1. Then, the local invariants are 
propagated to other control locations of the system to obtain global invariants. 
For example, in the simple transition system illustrated below, static analysis 
yields the local invariant EH (pc = 1 D a; > 0). Then, since a: > 0 is preserved by 
transition inc, it is a global invariant. 



inc: 



true X \= X 3- l 




2.2 Theorem Proving 

We use PVS [ORS-l-95] for invariant strengthening [GS96, HS96]. Given a tran- 
sition system T and a state predicate /, we say that I is inductive if / D pre{I). 
Obviously, if I is inductive and holds at the initial state of T, then I is an invari- 
ant of T. When I is not inductive, we can strengthen it by taking / Apre{I) and 
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check if the latter is inductive, that is, whether / A pre{I) D pre{I A pre{I)); or 
equivalently, whether / Apre(J) D pre (/). In general, this procedure terminates 
if there exists an n such that 

I A fre{I) A . . . A jrre^(I) D pre'^~^^ (I). (1) 

In this case, it follows that / A pre{I) A ... A pre”(/) is inductive: in particular, 
I is an invariant. 

This technique can be implemented in PVS as follows. We use a simple in- 
variance rule stating that I is an invariant if it is true initially and is preserved by 
all transitions. If / is inductive then applying the rule once would complete the 
proof. Otherwise, the prover presents a number of pending (unproved) subgoals: 
each subgoal results from the fact that I is not preserved by some transition. We 
then apply the invariance rule to the predicate obtained by taking the conjunc- 
tion of / and all the unproved subgoals: this amounts to attempting to prove that 
/ A pre{I) is inductive. If there exists an n such that (I) holds, then repeating 
this process n times would eliminate all the subgoals and complete the proof. 
This leads to a fully automatic procedure (that is not guaranteed to halt). 



2.3 Abstraction 

We use the abstraction technique described in [GS97]. The abstraction of a 
concrete transition system T relative to a finite set of state predicates B = 
{Bi,... ,Bfc} called boolean abstract variables, is a transition system denoted 
T/B. The states of the abstract system T/B are called abstract states; every 
abstract state is labeled with a valuation of the control variables of T and of the 
abstract variables. Let us now briefly describe how T/B is constructed. 

The initial abstract state is labeled with the initial control configuration of T 
and with the truth values of the abstract variables at the initial concrete state. 
Assume now that s^ is an abstract state, the abstract transitions going out of 

are then generated. Every concrete transition r, originating from a concrete 
state with the same control configuration as s^, can give rise to several abstract 
transitions. Each of these transitions will have the same label as t and lead to an 
abstract state obtained by computing (with Pvs) the effect of r (starting from 
s"^) on the control and abstract variables. 

Consider, for example, the concrete system illustrated below. An abstraction 
relative to Bi : {x = 0) and B 2 ■ (x = 1) generates the abstract system (a); 
while an abstraction only relative to B 2 yields the abstract system (b), of which 
only the initial portion is shown. Note that in the latter, simulating the concrete 
transition inc gives rise to two successors. This is because starting at the initial 
abstract state, where ~^(x = 1) holds, the transition inc performing a; := a; -I- 1 
can either lead to a state in which (x = 1) is true, or to a state in which the 
latter is false. Note also that in the abstract system (a), the only state labeled 
pc = 2 is also labeled (x = 1); we say this abstraction “shows” the property 
□ (pc = 2 D (x = 1)). On the other hand, the abstraction (b) does not show 
this property, since there exists an abstract state labeled pc =2 and ^{x = 1). 
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inc: 

true X X -\- 1 




(a) 



(b) 



To define the notion of “abstraction showing a property” we interpret the la- 
belling of each abstract state as a predicate 7t(s^) on the concrete variables; 
for instance, the predicate associated with the initial state of system (b) above is 
(pc = 1) A ^(x = 1). Let T/B be the abstraction of a concrete system T relative 
to the set of abstract variables B — {i?i, . . . , -Bfe}, and (p be a state predicate. 
We say that an abstract state shows ip if 7r(s"^) implies p. We say that T/B 
shows \Z\p, denoted T/B D'P) if ^ii abstract states show p. The crucial 

feature of these boolean abstractions, which is true by construction, is that for 
every computation 

Go Gl 
So ^ Si . . . 

of the concrete system T, there exists an abstract trace 



such that for i = 0,1,..., the labels of the abstract and concrete transitions 
coincide, and the boolean values of the abstract variables in sf and in Si coincide. 
Consequently, boolean abstractions are useful for proving invariants, since 

T/B \Z\p T 1= \Z\p- 

In general, an abstraction relative to a larger set of abstract variables can “show” 
more properties, because the prover has more information at its disposal when 
new abstract states are generated, therefore it can eliminate some of them, yield- 
ing a finer abstraction. Also, constructing an abstraction with some known invari- 
ants of the concrete system can assist in eliminating irrelevant abstract states. 



3 The Integration 



We introduce our approach for integrating the previously discussed static analy- 
sis, theorem proving and abstraction techniques. The general scheme is presented 
in Fig. 1. 
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program 




We demonstrate the method on a simple mutual exclusion problem for three 
identical processes (illustrated below), in which the semaphore S' is a shared 
variable. The property to be proved is that it is never the case that all three 
processes are in their critical sections simultaneously; this is expressed as I 1/ 
with 



I : ^{{pci = cs) A (pc2 = cs) A {pcs = cs)). 
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request^ : 




true ^ S := S + 1 

The first step is to employ the Invariant generator. This yields the global in- 
variant: IHI(S' > 0), which is fed to the Theorem prover and to the Abstraction 
generator, since it contains relevant information that may be useful in the sequel. 

The next step is to apply theorem proving in trying to prove that / is induc- 
tive. In our case, I is not inductive, and therefore the proof is not completed. 
Rather, we are presented with three (symmetric) pending subgoals, resulting 
from transitions that do not preserve I. For example, the following subgoal is re- 
lated to transition request^ when the third process attempts to enter the critical 
section while the other two processes are already there: 

Assuming: 

- pCi = cs 

- PC2 = CS 

- S>0 

Prove: 

- ^{pc's = cs) 

Obviously, the only way to prove this implication is to show that the assumption 
is contradictory; but I alone is too weak to prove it. The user now has two 
alternatives: either to remain in the prover and try to strengthen /, or to try to 
prove the pending subgoals using an abstraction. User-dependent decisions are 
represented in the diagram of Fig. 1 by dashed lines. Here, we choose the latter 
alternative. From the pending subgoals we identify the predicate (S' > 0) as a 
potentially relevant abstract variable and use the Abstraction generator to build 
the abstract system T / {{S > 0)}. The generated abstract system is then passed 
to the Trace analyzer together with a user-defined wish. 

A wish is a transition-related state property to be checked on the abstract 
system which, if shown correct, would enable to eliminate an unproved subgoal. 
The transition to which a wish refers is that who gave rise to the corresponding 
subgoal. 

Formulating a wish is straightforward. For example, a wish corresponding to the 
subgoal above is: “for every abstract transition labeled request^, if the origin 
abstract state is labeled pc\ = cs and pc 2 = cs then it is also labeled ^{S > 0)”. 

The role of the Trace analyzer is to find an abstract state that violates the 
wish. If there is no violating state, then the wish is granted and this information 
is passed back to the prover, allowing to complete the corresponding subgoal. 
In our example, however, there exists a violating abstract state and the Trace 
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analyzer returns the following abstract trace (starting from the initial abstract 
state) leading to it: 




This means that either mutual exclusion is not guaranteed by the program, or 
that the abstraction is too coarse. To decide between these two we must check 
whether this violating trace can be matched by a concrete computation. 

This task is performed by the Trace simulator, by simulating the transitions 
of the violating trace on the concrete system. It checks whether after every 
transition the valuation of the abstract variables in the concrete and abstract 
systems coincide. If this is the case, then we have a counterexample. Here, it is not 
the case, since a miss-match is detected in the third abstract state: according 
to the concrete computation, S = 0 should hold, but in the abstract system, 
S' > 0 holds. Thus, the abstraction is too coarse. In this situation, the simulator 
outputs a warning message indicating what “went wrong” in the abstraction; this 
information is obtained by computing the pre-images of the abstract variables 
on the violating trace. In our example, the message suggests that the abstraction 
“believes” that initially S > 2 holds. 

The user has now two options to pursue. The first is to do another abstraction 
relative to a larger set of abstract variables (obtained by adding the new ones 
suggested by the trace simulator as “responsible” for the miss-matches). For 
example, S' > 2 is a new relevant abstract variable. The second option is to 
formulate a conjecture and try to prove it in the theorem prover. A conjecture is 
an auxiliary invariant that would assist in generating a finer abstraction. In our 
case, an obvious conjecture is IHI(S < 2). If it was proved, then taking it into 
account when the next abstraction is computed would eliminate some abstract 
traces (e.g., the previous violating trace). 

We pursue the latter alternative. The proof of IHI(S < 2) does not succeed 
in one invariant strengthening step. From the new unproved subgoals we extract 
two new abstract variables: {S < 2) and (S' < 1). We compute the abstract 
system T/{(S > 0),(S < 2)(S < 1)}, which is fine enough to grant our orig- 
inal wishes. Armed with this information the prover eliminates the (original) 
unproved subgoals and completes the proof of mutual exclusion. 



As another example, we consider a version of the alternating bit protocol 
taken from [GS97] (see Fig. 2 below). 
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Receivejack: 
message _present 
and ack channel=h 




Receive joldmessage: 
not message _present or not 
message_channel.bit = c 
ack_channel: =c, 
ack _present:=true 



New_message: 
true -> b:=not(b), 

sent: =add( get_new_message( ),sent) 




RECEIVER: 

received=null 

c=false 

\ 




Sendjack: 
true-> c:=not(c), 
ack channel: =c. 



Send_message 

true -> message_channel:=(head(sent),b), 
message _present:=true 




Receive _message: 
message _present and 
message_channel.bit = c •> 
received: = 

add( message_channel. message, 
received) 



Resend_message: 

not(message _present and ack_channel=b) - 
message_channel: = ( head( sent), b ), 
message _present:=true 



ENVIRONMENT 

message _present=false 
ack _present=false 




Lose_ack: 
ack _present:=false 



Lose_message: 
message _present=false 



Figure 2: An Alternating bit Protocol. 



There are three processes: sender, receiver and environment. The sender gener- 
ates messages, records them in the sent list, then sends them to the receiver over 
the communication medium message-channel. The latter is modeled as a one- 
place buffer that can hold a message and a bit. The receiver records successfully 
received messages in the received list and sends an acknowledgment through 
the one-place buffer ack-channel. The environment can lose messages and ac- 
knowledgements by setting the boolean flags message-present and ack-present 
to False. This causes the sender/receiver respectively to retransmit. The safety 
property to be proved, is that the (unbounded) lists sent and received always 
differ by at most one message: \Z\{sent= received^ sent= tail(received)). 

The first step, static analysis, yields two invariants that are fed to the prover 
and to the abstraction generator. The next step is theorem proving, and since the 
property is not inductive, the proof is not completed. There are three pending 
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subgoals, all of which are related to transitions that update the sent/ received 
lists. 

For example, we have to prove that at the origin of transition receivc-message: 
sent = tail(received) . We take this predicate as an abstract variable, and formu- 
late the above as a wish. (We also used two other similar abstract variables and 
corresponding wishes which are omitted here.) 

After the abstraction has been computed, the trace analyzer returns a violat- 
ing trace in which a receivejmessage transition is taken from the initial abstract 
state. From the trace simulator we get a warning message indicating that the 
problem occurred because the transition receivejmessage should not have been 
enabled initially, and that the predicate “responsible” for this is the conjunct 
message-channel.hit = c in the guard of the transition. 

The obvious choice now is to take this predicate as a new abstract variable 
and to redo an abstraction. Still, the second abstraction does not grant our 
wishes; a new violating trace is detected and another abstract variable is sug- 
gested by the same mechanism described above. The third abstraction grants all 
original wishes, and then the prover completes the proof. 

In [GS97] the same protocol is analyzed by an abstraction relative to a set 
of sub-formulas of the guards, and human inspection of the generated abstract 
system is necessary to conclude that the protocol is indeed correct. Our approach 
is different: the abstract variables are suggested to the user by the failures of 
previous proof attempts and abstractions; the analysis of the abstract system 
is automatic and it issues information to the user; and in the end we obtain a 
complete rigorous proof. 

Our method can be automated in significant proportion. Indeed, all the com- 
ponents in the diagram (Fig. 1) perform automatic tasks, and user intervention is 
basically limited to choosing between abstraction and theorem proving. In both 
cases, the user is assisted in providing the relevant abstract variables, wishes and 
conjectures by the pending subgoals in the prover and by the warning messages 
issued by the trace simulator. The method is incremental: progress is made in 
each step, as every relevant abstract variable and conjecture reduces the search 
space; and the user gains insight of the problem while progressing towards a 
proof or a counterexample. Finally, we show in the next section that by inte- 
grating the components it is possible to prove more properties than by automatic 
invariant strengthening or automatic abstraction as stand-alones. 

4 Integration is More Powerful 

We now define the class of safety properties that can be proved to be invariant 
by the automatic invariant strengthening technique described in Section 2.2. For 
a transition system T and n = 0, 1, . . . consider the set 

INVn{T) = {□/ I / A ^e(7) A . . . A ffe'^{I) D ^e”+^(J)} 



(2) 
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Definition 1. The class INV{T) of safety properties that can he proven by pre- 
invariant strengthening is /7Vy„(T). 

Next, we define a particular class of properties that can be shown by the abstrac- 
tion mechanism described in Section 2.3. Given a state predicate I, we consider 
the set of predicates 

AV{I) = {I,f)re{I),... ,pre'^{I),... ,} 

Definition 2. The class ABS{T) of safety properties that can be shown by pre- 
abstraction is the class of properties |H|.f for which there exists a finite subset 
B C AV{I) such that T/B [=^33 I I/. 

It should be stressed that choosing the abstract variables from /, pre {!),■■■ , 
frre^{I) is not arbitrary: the guards of transitions, which in many cases allow 
to generate useful control abstractions [GS97] are just sub-formulas of these 
predicates. 

Note that both pre-invariant strengthening and pre-abstraction are fully au- 
tomatic techniques. Under the assumption that the same “reasoning power” is 
used for both pre-invariant strengthening and pre-abstraction (for example, both 
use the same theorem prover), the following result holds. 

Theorem 1. A safety property can he proved by pfe -invariant strengthening iff 
it can he shown by frre- abstraction. 

Proof (sketch). First, for every n = 0, 1, . . . , define the finite set of predicates 
AV„{I) as 

AVn{I) = {I,pfe{I),... ,fre’"{I)}. 

Then, the set ABSn{T) of safety properties that can be shown by abstraction 
relative to a subset of TU„(/) is 

ABSn{T) = {□/ I 3n > 0, ,8 C AVn{I) s.t. T/B 

Thus, by Definition 2, ABSniT) = ABS'„(T). Next, recall that by Defini- 

tion 1, the class of properties that can be proved by pre-invariant strengthening 
is Un>o Finally, it is not difficult to prove that 

ABSniT) C INVniT) C ABSn+i{T) 

and the result follows. □ 

In our method, when trying to prove a safety property EH/, the abstract 
variables and conjectures are also variants of sub- formulas of AV(I). As is 
shown in the following example, however, our method is strictly more power- 
ful than the fully automatic techniques of pre-invariant strengthening and of 
pre-abstractions. 

The example is a mutual-exclusion algorithm taken from [BGP97], and is 
based on the same principle as the well-known Bakery Algorithm: using “tickets” 
to control access to the critical section. The program is illustrated in Fig. 3: two 
global variables ti and t 2 are used for keeping record of ticket values, and two 
local variables a and b control the entry to the critical sections. 
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ti — t2 




The mutual-exclusion property is formulated as EH/ where 

I : = cs Apc2 = cs). (3) 

We employ our method to prove this property. Static analysis generates the local 
invariants pci = cs D a < ti and pc2 = cs D b < ti, which are then passed to 
the theorem prover and to the abstraction generator. Theorem proving yields 
two unproved subgoals, from which we identify the predicates (a < ti) and 
(6 < ti) as relevant abstract variables (note that these predicates are simply 
the guards). The wish associated with the transition in-a is: “any abstract state 
labeled pci = nc, pc2 = cs is also labeled ~^{a < ti) ” . That is, the guard (a < ti) 
should prevent the first process from entering its critical section while the second 
is already there. A similar wish is associated with the transition in-b. 

The first abstraction does not grant these wishes. A violating trace is pro- 
duced by the trace analyzer and fed to the trace simulator, which identifies it as 
not corresponding to a concrete computation; thus, the abstraction is too coarse. 
By computing pre-images of the abstract variable (a < ti), the system outputs 
a warning message indicating that the error occurred since the abstraction “be- 
lieves” that initially: ti < t2 — 1 - 

The user now has two options. The first is to add ti < ^2 — 1 as a new 
abstract variable and do another abstraction. The second is to formulate a con- 
jecture and try to prove it. Choosing the former alternative is reasonable since 
it would undoubtedly result in a finer abstraction. When it is not too difficult 
to come up with a conjecture, however, the latter is preferred. This is because a 
proved (stronger) conjecture usually eliminates more violating traces in further 
abstractions, and therefore significantly reduces the number of iterations. 

In our example this is the case, since it is easy to see that whenever both 
processes are at their init location, the stronger relation t\ = t2 (rather than 
ti < ^2 — 1) should hold (this is true initially, and any loop that goes back to the 
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init locations increases both t\ and t 2 by one). So, we formulate the conjecture 

I I (pci = init A pc 2 = init Z} t\ = t^)- (4) 

In the prover, (4) is proved by three iterations of invariant strengthening. We 
then use it in a second abstraction (also relative to (a < t\) and (6 < ti)). This 
time, the wishes are granted, and the prover can discharge the unproved subgoals 
and complete the proof. 

An interesting conclusion can be drawn from this simple example. While the 
conjecture (4) can be proved by invariant strengthening, this is not the case 
for the mutual-exclusion property EH.! itself. As shown in [BGP97], backwards 
analysis for this property does not converge, and hence (3) cannot be proved by 
pre-invariant strengthening. Therefore, by Theorem 1, mutual exclusion cannot 
be shown by pre-abstraction, either. 

Moreover, it is not difficult to prove that even an abstraction relative to any 
finite set of sub-formulas of pre-images of / (such as the guards of the transitions) 
cannot show (3). The reason for this is that to prove (3) it is important to 
know when ti = t 2 holds, but the pre-images of I express only weaker relations 
between ti and t 2 - (In the example we have obtained this information by proving 
the conjecture (4) instead of IHI(pci = init A pc 2 = init D ^(ti < t 2 — 1)) as 
suggested by the system.) 

This demonstrates a typical use of the methodology, in which the detailed 
feedback from the system together with moderate amount of user ingenuity 
yields the relevant auxiliary invariant. This is in contrast to an ordinary theorem 
proving process, in which the user usually has to invest much more effort to come 
up with suitable auxiliary invariants. 

5 Conclusion and Future Work 

As an attempt to address the problem of the significant user ingenuity that is 
required to come up with appropriate auxiliary invariants or with suitable ab- 
straction mappings, we have presented a new methodology for integrating static 
analysis, theorem proving and abstractions. The key features of our approach 
are 



— It is incremental: each step is based on information obtained from failures 
of previous steps. When the iterative process terminates, it yields a proof or 
a counterexample. 

— It is goal- directed: abstractions are guided by a subgoals in a failed proof 
attempt. 

— It is partially automatic: each component performs an automatic task, the 
user chooses which component to invoke at each step and how to apply it. 

— User input is highly guided by information provided by the system. 

— It is general, in principle, and not dependent on a particular implementation 
of the components. 
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For the experiments described in the paper we have used Pvs [ORS+95] for the- 
orem proving and the Invariant Checker [GS97] for static analysis and abstrac- 
tion. We are currently building a tool that would incorporate Smv [BCM-I-92] for 
trace analysis and simulation, and would also offer a connection to other static 
analysis tools [HPR97] as well as more general abstraction techniques [BL098]. 

Acknowledgments. We wish to thank John Rushby and Natarajan Shankar 
for valuable comments, Sam Owre for lending us help with Pvs, and Hassen 
Saidi for assisting us with the Invariant Checker. 
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Abstract. Symbolic Model Checking [3, 14] has proven to be a powerful tech- 
nique for the verification of reactive systems. BDDs [2] have traditionally been 
used as a symbolic representation of the system. In this paper we show how 
boolean decision procedures, like Stalmarck’s Method [16] or the Davis & Put- 
nam Procedure [7], can replace BDDs. This new technique avoids the space blow 
up of BDDs, generates counterexamples much faster, and sometimes speeds up 
the verification. In addition, it produces counterexamples of minimal length. We 
introduce a bounded model checking procedure for LTL which reduces model 
checking to propositional satisfiability. We show that hounded LTL model check- 
ing can be done without a tableau construction. We have implemented a model 
checker BMC, based on bounded model checking, and preliminary results are 
presented. 



1 Introduction 

Model checking [4] is a powerful technique for verifying reactive systems. Able to find 
subtle errors in real commercial designs, it is gaining wide industrial acceptance. Com- 
pared to other formal verification techniques (e.g. theorem proving) model checking is 
largely automatic. 

In model checking, the specification is expressed in temporal logic and the sys- 
tem is modeled as a finite state machine. For realistic designs, the number of states of 
the system can be very large and the explicit traversal of the state space becomes in- 
feasible. Symbolic model checking [3, 14], with boolean encoding of the finite state 
machine, can handle more than 10^*' states. BDDs [2], a canonical form for boolean 
expressions, have traditionally been used as the underlying representation for symbolic 
model checkers [14]. Model checkers based on BDDs are usually able to handle sys- 
tems with hundreds of state variables. However, for larger systems the BDDs generated 
during model checking become too large for currently available computers. In addition, 
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selecting the right ordering of BDD variables is very important. The generation of a 
variable ordering that results in small BDDs is often time consuming or needs manual 
intervention. For many examples no space efficient variable ordering exists. 

Propositional decision procedures (SAT) [7] also operate on boolean expressions 
but do not use canonical forms. They do not suffer from the potential space explosion 
of BDDs and can handle propositional satisfiability problems with thousands of vari- 
ables. SAT based techniques have been successfully applied in various domains, such 
as hardware verification [17], modal logics [9], formal verification of railway control 
systems [1], and AI planning systems [11]. A number of efficient implementations are 
available. Some notable examples are the PROVE tool [1] based on Stalmarck’s Method 
[16], and SATO [18] based on the Davis & Putnam Procedure [7]. 

In this paper we present a symbolic model checking technique based on SAT pro- 
cedures. The basic idea is to consider counterexamples of a particular length k and 
generate a propositional formula that is satisfiable iff such a counterexample exists. In 
particular, we introduce the notion of bounded model checking, where the bound is the 
maximal length of a counterexample. We show that bounded model checking for lin- 
ear temporal logic (LTL) can be reduced to propositional satisfiability in polynomial 
time. To prove the correctness and completeness of our technique, we establish a cor- 
respondence between bounded model checking and model checking in general. Unlike 
previous approaches to LTL model checking, our method does not require a tableau or 
automaton construction. 

The main advantages of our technique are the following. Lirst, bounded model 
checking finds counterexamples very fast. This is due to the depth first nature of SAT 
search procedures. Finding counterexamples is arguably the most important feature of 
model checking. Second, it finds counterexamples of minimal length. This feature helps 
the user to understand a counterexample more easily. Third, bounded model check- 
ing uses much less space than BDD based approaches. Finally, unlike BDD based ap- 
proaches, bounded model checking does not need a manually selected variable order or 
time consuming dynamic reordering. Default splitting heuristics are usually sufficient. 

To evaluate our ideas we have implemented a tool BMC based on bounded model 
checking. We give examples in which SAT based model checking significantly out- 
performs BDD based model checking. In some cases bounded model checking detects 
errors instantly, while the BDDs for the initial state cannot be built. 

The paper is organized as follows. In the following section we explain the basic 
idea of bounded model checking with an example. In Section 3 we give the semantics 
for bounded model checking. Section 4 explains the translation of a bounded model 
checking problem into a propositional satisfiability problem. In Section 5 we discuss 
bounds on the length of counterexamples. In Section 6 our experimental results are 
presented, and Section 7 describes some directions for future research. 



2 Example 



Consider the following simple state machine M that consists of a three bit shift register 
X with the individual bits denoted by x[0],x[l], and x[2]. The predicate T{x,x') denotes 
the transition relation between current state values x and next state values V and is 
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equivalent to: 

(y [0] = x[l]) A (y [1] = x[2]) A (y [2] = 1) 

In the initial state the content of the register x can be arbitrary. The predicate I{x) that 
denotes the set of initial states is true. 

This shift register is meant to be empty (all bits set to zero) after three consecu- 
tive shifts. But we introduced an error in the transition relation for the next state value 
of x[2], where an incorrect value 1 is used instead of 0. Therefore, the property, that 
eventually the register will be empty (written as x = 0) after a sufficiently large number 
of steps is not valid. This property can be formulated as the LTL formula F(x = 0). 
We translate the “universal” model checking problem AF(x = 0) into the “existential” 
model checking problem EG(x y 0) by negating the formula. Then, we check if there 
is an execution sequence that fulfills G(x y 0). Instead of searching for an arbitrary 
path, we restrict ourselves to paths that have at most k+ 1 states, for instance we choose 
k = 2. Call the first three states of this path xq, xi and X 2 and let xq be the initial state (see 
Figure 1). Since the initial content of x can be arbitrary, we do not have any restriction 





[ 


y[0] 


X.[0] 


^2 


^1] 




^jii] 




2d 


-0[2] 




y[2] 


y[2] 





Fig. 1. Unrolling the transition relation twice and adding a back loop. 



on XQ. We unroll the transition relation twice and derive the propositional formula fm 
defined as I{xq) f\T{xQ,x\) l\T (xi,X 2 ). We expand the definition of T and 7, and get the 
following formula. 

(xi[0] =xo[l]) A (xi[l] =xo[2]) A (xi[2] = 1) A 1st step 
(x2[0] = XI [1]) A (x 2 [l] = XI [2]) A (x 2[2] = 1) 2nd step 

Any path with three states that is a “witness” for G(x y 0) must contain a loop. Thus, 
we require that there is a transition from X 2 back to the initial state, to the second state, 
or to itself (see also Figure 1). We represent this transition as L,- defined as T{x 2 ,Xi) 
which is equivalent to the following formula. 

(x,[0] =X2[1]) A (x,[l] =X2[2]) a (x,-[2] = 1) 

Finally, we have to make sure that this path will fulfill the constraints imposed by the 
formula G(x y 0). In this case the property Si defined as x, y 0 has to hold at each state. 
Si is equivalent to the following formula. 

(x,[0] = 1) V (x,[l] = 1) V (x,[2] = 1) 
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Putting this all together we derive the following propositional formula. 

2 2 

fMA\/LiA/\Si (1) 

/■=0 /=0 

This formula is satishable iff there is a counterexample of length 2 for the original 
formula F(x = 0). In our example we hud a satisfying assignment for (1) by setting 
Xi[j] := 1 for all i,y = 0,1,2. 

3 Semantics 

ACTL* is dehned as the subset of formulas of CTL* [8] that are in negation normal 
form and contain only universal path quantifiers. A formula is in negation normal 
form (NNF) if negations only occur in front of atomic propositions. ECTL* is de- 
hned in the same way, but only existential path quantihers are allowed. We consider 
the next time operator ‘X’, the eventuality operator ‘F’, the globally operator ‘G’, and 
the until operator ‘U’. We assume that formulas are in NNF. We can always transform 
a formula in NNF without increasing its size by including the release operator ‘R’ 
(/ R g iff U ^g)). In an FTF formula no path quantihers (E or A) are allowed. In 
this paper we concentrate on FTF model checking. Our technique can be extended to 
handle full ACTF* (resp. FCTF*). 

Definition 1. A Kripke structure is a tuple M = {S,I, T,t) with a finite set of states S, 
the set of initial states I f S, a transition relation between states T C S x S, and the 
labeling of the states — > ‘P{Sf) with atomic propositions Si. 

We use Kripke structures as models in order to give the semantics of the logic. For 
the rest of the paper we consider only Kripke structures for which we have a boolean en- 
coding. We require that 5 = {0, 1 }", and that each state can be represented by a vector of 
state variables s = (s(l), . . . ,s{n)) where s{i) for i = are propositional variables. 

We dehne propositional formulas fi{s), fT{s,t) and fp{s) as: fi{s) iff s G I, fT{s,t) iff 
(s,t) G T, and fp{s) iff p G £{s). For the rest of the paper we simply use T{sf) instead 
of frisf) etc. In addition, we require that every state has a successor state. That is, for 
all s G 5 there is a t G 5 with (s,t) G T. For (s,f) G T we also write s^t. For an inhnite 
sequence of states 7t = • ■) we dehne n{i) = Si and n‘ = {si,Si+i,. ..) for i G IN. 

An inhnite sequence of states 7t is a path if n{i) n{i + I) for all i G IN. 

Definition 2 (Semantics). Let M be a Kripke structure, n be a path in M and f be an 
LTL formula. Then Jt ^ / (/ is valid along n) is defined as follows. 
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Definitions (Validity). An LTL formula f is universally valid in a Kripke structure M 
(in symbols M \= Af) iff it \= f for all paths K in M with 7t(0) G I. An LTL formula f is 
existentially valid in a Kripke structure M (in symbols M ^ E/j iff there exists a path 
n in M with 7t ^ / and 7t(0) G I. 

Determining whether an LTL formula / is existentially (resp. universally) valid in a 
given Kripke structure is called an existential (resp. universal) model checking problem. 

In conformance to the semantics of CTL* [8], it is clear that an LTL formula / is 
universally valid in a Kripke structure M iff ~^f is not existentially valid. In order to 
solve the universal model checking problem, we negate the formula and show that the 
existential model checking problem for the negated formula has no solution. Intuitively, 
we are trying to find a counterexample, and if we do not succeed then the formula 
is universally valid. Therefore, in the theory part of the paper we only consider the 
existential model checking problem. 

The basic idea of bounded model checking is to consider only a finite prefix of a path 
that may be a solution to an existential model checking problem. We restrict the length 
of the prefix by a cerfain bound k. In practice we progressively increase the bound, 
looking for longer and longer possible counterexamples. 

A crucial observation is that, though the prefix of a pafh is finite, it still might repre- 
sent an infinite path if there is a back loop from the last state of the prefix to any of the 
previous states (see Figure 2(b)). If there is no such back loop (see Figure 2(a)), then 
the prefix does not say anything about the infinite behavior of the path. For instance, 
only a prefix wifh a back loop can represent a witness for Gp. Even if p holds along all 
the states from to Sk, but there is no back loop from st to a previous state, then we 
cannot conclude that we have found a witness for Gp, since p might not hold at 5<;+i. 



Si 



Sk 




(a) no loop 



(b) (k, /)-loop 



Fig. 2. The two cases for a bounded path. 



Definition 4. For I <kwe call a path n a (k, /)-loop ifn{k) — > Jt(/) and K = u-v^ with 
u = (7t(0), . . . ,7t(/ — 1)) and v — (n{l),. . . ,n{k)). We call n simply a k-loop if there is 
an Z G IN with I < kfor which n is a {k,l)-loop. 

We give a bounded semantics that is an approximation to the unbounded semantics 
of Definition 2. It allows us to define Ihe bounded model checking problem and in the 
next section we will give a translation of a bounded model checking problem into a 
satisfiability problem. 

In the bounded semantics we only consider a finite prefix of a path. In particular, 
we only use the first k + I states (so,-- - ,Sk) of a path to determine the validity of a 
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formula along that path. If a path is a A:-loop then we simply maintain the original LTL 
semantics, since all the information about this (inhnite) path is contained in the prehx 
of length k. 

Definition 5 (Bounded Semantics for a Loop). Let A: G IN and nbe a k-loop. Then an 
LTL formula f is valid along the path 7t with bound k (in symbols tt.\=j^f)iffTi\^f. 

Assume that n is not a A:-loop. Then the formula f Fp is valid along n in the 
unbounded semantics if we can find an index i G IN such that p is valid along the suffix 
n‘ of n. In the bounded semantics the (k+ l)-th state n(k) does not have a successor. 
Therefore, we cannot define the bounded semantics recursively over suffixes (e.g. n‘) of 
Jt. We keep the original 7t instead but add a parameter i in the definition of the bounded 
semantics and use the notation |=^. The parameter i is the current position in the prefix 
of 7t. In Lemma 7 we will show that 7t |=^ / implies k‘ |= /. 

Definition 6 (Bounded Semantics without a Loop). Let A: G IN, and let n be a path 
that is not a k-loop. Then an LTL formula f is valid along jt with bound k ( in symbols 
\=k f) ijft^ / where 
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Note that if n is not a A:-loop, then we say that Gf is not valid along n in the bounded 
semantics with bound k since / might not hold along . Similarly, the case for fRg 
where g always holds and / is never fulfilled has to be excluded. These constraints 
imply that for the bounded semantics the duality of G and F (^F/ = G^/) and the 
duality of R and U (^(/ U g) = (^/) R (^g)) no longer hold. 

The existential and universal bounded model checking problems are defined in the 
same manner as in Definition 3. Now we describe how the existential model checking 
problem (M |= E/) can be reduced to a bounded existential model checking problem 
(M^E/). 

Lemma 7. Let h be an LTL formula and K a path, then n [=k h ^ n h 

Proof. If 7t is a A:-loop then the conclusion follows by definition. In the other case we 
assume that jt is not a loop. Then we prove by induction over the structure of / and 
i < k the stronger property Ti\='j^h => Ti‘ \=h. We only consider the most complicated 
case h = fRg. 

Jt hi / R g ^ 3;, / < y < A: [ 7t hi / and Vn, ( < n < y . 7t hi g ] 

3y, / < y < A: [ n> h / and Vn, A < n < y . it” h g ] 

^ 3y, i<j[ n-i h / and Vn, i<n< j.n" (= g] 
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Let / = j — i and n' = n — i 

^ 3/ [ ^ / and Vn', n' < f . n‘^"' ^ g ] 

^ 3; [ (Tt')' h / and Vn, n < J. (k‘)" |= g ] 

^ Vn [ (jt')" h ^ or 3;, j < n. {%y [= / ] 

^ Jt' h / R ^ 

In the next-to-last step we used the following fact: 

3m [ Tt'” [= / and V/, / < m. 7t^ ^ g ] Vn [ Jt" ^ g or 3 j, j < n.n-' [= f ] 

Assume that m is the smallest number such that 71™ |= / and 7t^ |= g for all I with I < m. 
In the first case we consider n > m. Based on the assumption, there exists j < n such 
that n’ \= f (choose j = m). The second case is n < m. Because 7t^ |= g for all / < m we 
have Jt" 1= g for all n<m. Thus, for all n we have proven that the disjunction on the 
right hand side is fulfilled. □ 



Lemma 8. Let f be an LTL formula f and M a Kripke structure. If M |= E/ then there 
exists A: G IN’ with M E/ 

Proof. In [3, 5, 12] it is shown that an existential model checking problem for an LTL 
formula / can be reduced to FairCTL model checking of the formula EGtrue in a 
certain product Kripke structure. This Kripke structure is the product of the original 
Kripke structure and a “tableau” that is exponential in the size of the formula / in the 
worst case. If the LTL formula / is existentially valid in M then there exists a path 
in the product structure that starts with an initial state and ends with a cycle in the 
strongly connected component of fair states. This path can be chosen to be a k-loop 
with k bounded by |S| • 2l^l which is the size of the product structure. If we project this 
path onto its first component, the original Kripke structure, then we get a path n that is 
a k-loop and in addition fulfills 7t |= /. By dehnition of the bounded semantics this also 
implies 7t |=*, /. □ 

The main theorem of this section states that, if we take all possible bounds into 
account, then the bounded and unbounded semantics are equivalent. 

Theorem 9. Let f be an LTL formula, M a Kripke structure. Then M |= E/ ijf there 
exists A: G IN with M \=k E/. 



4 Translation 

In the previous section, we dehned the semantics for bounded model checking. We now 
reduce bounded model checking to propositional satishability. This reduction enables 
us to use efficient propositional decision procedures to perform model checking. 

Given a Kripke structure M, an LTL formula / and a bound k, we will construct a 
propositional formula [[ M,f ]],;,. The variables sq,... ,Sk in [[M,/ ]],;, denote a finite se- 
quence of states on a path n. Each Si is a vector of state variables. The formula [[ M,f]\^. 
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essentially represents constraints on sq, . . . ,Sk such that [[ M,f ]]^ is satisfiable iff / is 
valid along n. 

The size of [[ M,/ is polynomial in the size of / if common subformulas are 
shared (as in our tool BMC). It is quadratic in k and linear in the size of the propositional 
formulas for T, I and the p G A. Thus, existential bounded model checking can be 
reduced in polynomial time to propositional satishability. 

To construct [[ M,/ ]]^, we hrst dehne a propositional formula [[ M ]]^ that constrains 
io, ■ ■ • , ■Syt to be on a valid path 7t in M. Second, we give the translation of an LTL formula 
/ to a propositional formula that constrains n to satisfy /. 

Definition 10 (Unfolding the Transition Relation). For a Kripke structure M, G IN 



k-\ 

[[M\ :=/(^o)A/\r(^,-,^,-+i) 

!=0 

Depending on whether a path is a k-loop or not (see Figure 2), we have two different 
translations of the temporal formula /. In Definition 1 1 we describe the translation if 
the path is not a loop (“[[ • J^’). The more technical translation where the path is a loop 
(“J[ • J^”) is given in Definition 13. 

Consider the formula h:= p\] q and a path n that is not a A:-loop for a given A: G IN 
(see Figure 2(a)). Starting at n‘ for i G IN with i < k the formula h is valid along Ji' with 
respect to the bounded semantics iff there is a position j with i <j<k and q holds 
at 7t(y). In addition, for all states Jt(n) with n G IN starting at 7t(i) up to 71 ( 7 — 1) the 
proposition p has to be fulhlled. Therefore the translation is simply a disjunction over 
all possible positions j at which q eventually might hold. For each of these positions 
a conjunction is added that ensures that p holds along the path from 7t(/) to 7t(y — 1). 
Similar reasoning leads to the translation of the other temporal operators. 

The translation “[[ • maps an LTL formula into a propositional formula. The 
parameter k is the length of the prefix of the path that we consider and i is the current 
position in this prehx (see Figure 2(a)). When we recursively process subformulas, i 
changes but k stays the same. Note that we define the translation of any formula G/ as 
false. This translation is consistent with the bounded semantics. 



Definition 11 (Translation of an LTL Formnla without a Loop). For an LTL formula 
f and k, i G IN, with i < k 

[[Pi'k ■■= l-Pi'k ■■= 

[[/Ag]]i := mingfk i/yst ■■= 

{{Gff, := false [[F/]]l := 

[[X/]]^ := ifi<k then [[/]]^^* else false 

[[fVgt := yU{l8]]i/\Ai-J[[f]]t) 

[[/Rg]]l := 



^p{si) 

[[/]]lv[[g]]^ 

yUiffi 
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Now we consider the case where the path is a ^-loop. The translation • ]]^” of an 
LTL formula depends on the current position i and on the length of the prefix k. It also 
depends on the position where the loop starts (see Figure 2(b)). This position is denoted 
by I for loop. 

Definition 12 (Successor in a Loop). Let k, I, i G IN, with l,i<k. Define the successor 
succ(i) ofi in a {k,l)-loop as succ(i) := i+ I for i < k and succ(f) := I for i = k. 



Definition 13 (Translation of an LTL Formula for a Loop). Let f be an LTL formula, 
k,l,i G IN, with I, i < k. 
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The translation of the formula depends on the shape of the path (whether it is a loop 
or not). We now define a loop condition to distinguish these cases. 

Definition 14 (Loop Condition). Fork, I G IN, let iLk := T{sk,S[), := VLo i^k 

Definition 15 (General Translation). Let f be an LTL formula, M a Kripke structure 
and A: G IN 



[[M,/1,:=[[M1,a|^(-L,A[[/]]“) V V (lULilft) 

The left side of the disjunction is the case where there is no back loop and the 
translation without a loop is used. On the right side all possible starts Z of a loop are 
tried and the translation for a (A:,/) -loop is conjuncted with the corresponding loop 
condition. 

Theorem 16. [[ M,/ ]]^ is satisfiable ijfM E/. 



Corollary 17. M ^ A^/ ijf\M,f is unsatisfiable for all A: G IN. 
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5 Determining the bound 

In Section 3 we have shown that the unbounded semantics is equivalent to the bounded 
semantics if we consider all possible bounds. This equivalence leads to a straightfor- 
ward LTL model checking procedure. To check whether M ^ E/, the procedure checks 

M \=k E/ for k = 0, 1 , 2, \f M \=k E/, then the procedure proves that M \= E/ and 

produces a witness of length k. If M ^ E/, we have to increment the value of k indefi- 
nitely, and the procedure does not terminate. In this section we establish several bounds 
on k. If M E/ for all k within the bound, we conclude that M ^ E/. 

5.1 ECTL 

ECTL is a subset of ECTL* where each temporal operator is preceded by one existential 
path quantiher. We have extended bounded model checking to handle ECTL formulas. 
Semantics and translation for ECTL formulas can be found in the full version of this 
paper. In general, better bounds can be derived for ECTL formulas than for LTL formu- 
las. The intersection of the two sets of formulas includes many temporal properties of 
practical interest (e.g. EEp and EG/?). Therefore, we include the discussion of bounds 
for ECTL formulas in this section. 

Theorem 18. Given an ECTL formula f and a Kripke structure M. Let \M\ be the 
number of states in M, then M \= E/ iff there exists k < \M\ with M \=k E/. 

In symbolic model checking, the number of states in a Kripke structure is bounded 
by 2”, where n is the number of boolean variables to encode the Kripke structure. 
Typical model checking problems involve Kripke structures with tens or hundreds of 
boolean variables. The bound given in Theorem 18 is often too large for practical prob- 
lems. 

Deflnition 19 (Diameter). Given a Kripke structure M, the diameter ofM is the mini- 
mal number c/ G IN with the following property. For every sequence of states io, • • ■ , Sd+\ 
with G T for i < d, there exists a sequence of states to,...fi where I < d such 

that to = ^0. ti = Srf+i ond G T for j < 1. In other words, if a state v is reachable 

from a state u, then v is reachable from u via a path of length d or less. 

Theorem 20. Given an ECTL formula f := EE/? and a Kripke structure M with diam- 
eter d, M \=: EE p iff there exists k <d with M [=k EE /?. 

Theorem 21. Given a Kripke structure M, its diameter d is the minimal number that 
satisfies the following formula. 

d d-l d 

Vso, ■ ■ • , Srf+1 ■ 3to, ■ ■ ■ , fd- l\T (j,-, ) ^{to = so^ f\T (h, h+i ) A \/ h = Sd+\) 

1=0 i =0 1=0 

Lor a Kripke structure with explicit state representation, well-known graph algo- 
rithms can be used to determine its diameter. Lor a Kripke structure M with a boolean 
encoding, one may verify that d is indeed a diameter of M by evaluating a quantified 
boolean formula (QBE), shown in Theorem 21 . We conjecture that a quantihed boolean 
formula is necessary to express the property that d is the diameter of M. Unfortunately, 
we do not know of an efficient decision procedure for QBE. 
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Definition 22 (Recnrrence Diameter). Given a Kripke structure M, its recurrence di- 
ameter is the minimal number t/ G IN with the following property. For every sequence 
of states So,..., with {si, ) G T for i < d, there exists j <d such that = Sj. 

Theorem 23. Given an ECTL formula f and a Kripke structure M with recurrence 
diameter d, M \= E/ iff there exists k < d with M \=k E/. 

Theorem 24. Given any Kripke structure M, its recurrence diameter d is the minimal 
number that satisfies the following formula 

d d 

Vso, ■ ■ • ,Srf+i- f\ T{si,Si+\) —^\jsi = Sd+i 

i=0 i=0 

The recurrence diameter in Definition 22 is a bound on k for bounded model check- 
ing that is applicable for all ECTL formulas. The property of a recurrence diameter can 
be expressed as a propositional formula as shown in Theorem 24. We may use a propo- 
sitional decision procedure to determine whether a number d is the recurrence diameter 
of a Kripke structure. The bound based on recurrence diameter is not as tight as that 
based on the diameter. For example, in a fully connected Kripke structure, the graph 
diameter is 1 while the recurrence diameter equals the number of states. 

5.2 LTL 

LTL model checking is known to be PSPACE-complete [15]. In section 4, we reduced 
bounded LTL model checking to propositional satisfiability and thus showed that it is in 
NP. Therefore, a polynomial bound on k with respect to the size of M and / for which 
M \=t E/ M \= E/ is unlikely to be found. Otherwise, there would be a polyno- 
mial reduction of LTL model checking problems to propositional satisfiability and thus 
PSPACE = NP. 

Theorem 25. Given an LTL formula f and a Kripke structure M, let \M\ be the number 
of states in M, then M \= E/ iff there exists k < \M\ X 2l-^l with M \=t E/. 

For the subset of LTL formulas that involves only temporal operators F and G, LTL 
model checking is NP-complete [15]. For this subset of LTL formulas, it can be shown 
that there exists a bound on k linear in the number of states and the size of the formula. 

Definition 26 (Loop Diameter). We say a Kripke structure M is lasso shaped if every 
path p starting from an initial state is of the form UpVp, where Up and Vp are finite 
sequences of length less or equal to u and v, respectively. We define the loop diameter 
ofM as (m, v). 

Theorem 27. Given an LTL formula f and a lasso-shaped Kripke structure M, let the 
loop diameter ofM be {u, v), then M ^ E/ iff there exists k <u-\-v with M \=j^ E/. 

Theorem 27 shows that for a restricted class of Kripke structures, small bounds on 
k exist. In particular, if a Kripke structure is lasso shaped, k is bounded by m -f v, where 
(m, v) is the loop diameter of M. 
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6 Experimental Results 



We have implemented a model checker BMC based on bounded model checking. Its 
input language is a subset of the SMV language [14], It outputs a SMV program or 
a propositional formula. For the propositional output mode, two different formats are 
supported. The first format is the DIMACS format [10] for satisfiability problems. The 
SATO tool [18] is a very efficient implementation of the Davis & Putnam Procedure [7] 
and it uses the DIMACS format. We also support the input format of the PROVE Tool 
[1] which is based on Stalmarck’s Method [16]. 

As benchmarks we chose examples where BDDs are known to behave badly. First 
we investigated a sequential multiplier, the sequential shift and add multiplier of [6]. 
We formulated as model checking problem the following property: when the sequential 
multiplier is finished its output is the same as the output of a combinational multiplier 
(the C6288 circuit from the ISCAS’85 benchmarks) applied to the same input words. 
These multipliers are 16x16 bit multipliers but we only allowed 16 output bits as in [6] 
together with an overflow bit. We proved the property for each output bit individually 
and the results are shown in Table 1 . For SATO we conducted two experiments to study 
the effect of the ‘-g’ parameter that controls the maximal size of cached clauses. We 
picked a very small value (‘-g 5’) and a very large value (‘-g 50’). Note that the overflow 
bit depends on all the bits of the sequential multiplier and occurs in the specification. 
Thus, cone of influence reduction could not remove anything. 



bit 


SMVi 
sec MB 


SMV2 
sec MB 


SATO -g5 
sec MB 


SATO -g50 
sec MB 


PROVE 
sec MB 


0 


919 


13 


25 


79 


0 


0 


0 


1 


0 


1 


1 


1978 


13 


25 


79 


0 


0 


0 


1 


0 


1 


2 


2916 


13 


26 


80 


0 


0 


0 


2 


0 


1 


3 


4744 


13 


27 


82 


0 


0 


0 


3 


1 


2 


4 


6580 


15 


33 


92 


2 


0 


3 


4 


1 


2 


5 


10803 


25 


67 


102 


12 


0 


36 


7 


1 


2 


6 


43983 


73 


258 


172 


55 


0 


208 


10 


2 


2 


7 


>17h 




1741 


492 


209 


0 


642 


13 


7 


3 


8 








>1GB 


473 


0 


1198 


16 


29 


3 


9 










856 


1 


2413 


20 


58 


3 


10 










1837 


1 


2055 


20 


91 


3 


11 










2367 


1 


1667 


19 


125 


3 


12 










3830 


1 


976 


17 


156 


4 


13 










5128 


1 


4363 


25 


186 


4 


14 










4752 


1 


2170 


23 


226 


4 


15 










4449 


1 


6847 


31 


183 


5 


sum 


71923 


2202 


23970 


22578 


1066 



Table 1. 16x16 bit sequential shift and add multiplier with overflow flag and 16 output 
bits (sec = seconds, MB = Mega Byte). 
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In the column SMVi of Table 1 the official version of the CMU model checker 
SMV was used. SMV 2 is a version by Bwolen Yang from CMU with improved support 
for conjunctive partitioning. We used a manually chosen variable ordering where the 
bits of registers are interleaved. Dynamic reordering failed to find a considerably better 
ordering in a reasonable amount of time. 

We used a barrel shifter as another example. It rotates the contents of a register hie 
b with each step by one position. The model also contains another register hie r that is 
related to b in the following way. If a register in r and one in b have the same contents 
then their neighbors also have the same contents. This property holds in the initial state 
of the model, and we proved that it is valid in all successor states. The results of this 
experiment can be found in Table 2. The width of the registers is chosen to be [log 2 |r|] 
where |r| is the number of registers in the register hie r. In this case we were also able 
to prove the recurrence diameter (see Dehnition 22) to be |r|. This took only very little 
time compared to the total verihcation time and is shown in the column “diameter”. 

In [13] an asynchronous circuit for distributed mutual exclusion is described. It con- 
sists of n cells for n users that want to have exclusive access to a shared resource. We 
proved the liveness property that a request for using the resource will eventually be 
acknowledged. This liveness property is only true if each asynchronous gate does not 
delay execution indehnitely. We model this assumption by a fairness constraint for each 
individual gate. Each cell has exactly 1 8 gates and therefore the model has n • 1 8 fairness 
constraints where n is the number of cells. Since we do not have a bound for the max- 
imal length of a counterexample for the verification of this circuit we could not verify 
the liveness property completely. We only showed that there are no counterexamples of 
particular length k. To illustrate the performance of bounded model checking we have 
chosen k = 5, 10. The results can be found in Table 3. 

We repeated the experiment with a buggy design. For the liveness property we sim- 
ply removed several fairness constraints. Both PROVE and SATO generate a counterex- 
ample (a 2-loop) instantly (see Table 4). 

7 Conclusion 

This work is the first step in applying SAT procedures to symbolic model checking. 
We believe that our technique has the potential to handle much larger designs than 
what is currently possible. Towards this goal, we propose several promising directions 
of research. We would like to investigate how to use domain knowledge to guide the 
search in SAT procedures. New techniques are needed to determine the diameter of a 
system. In particular, it would be interesting to study efficient decision procedures for 
QBE Combining bounded model checking with other state space reduction techniques 
presents another interesting problem. 
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kl 


SMV2 
sec MB 


SATO -glOO 
diameter 
sec MB 


SATO -g20 
sec MB 


PROVE 
diameter 
sec MB 


PROVE 
sec MB 


3 


1 


49 


0 


1 


0 


0 


0 


1 


0 


1 


4 


1 


49 


0 


1 


0 


1 


0 


1 


0 


1 


5 


13 


83 


0 


2 


60 


2 


0 


1 


1 


2 


6 


509 


447 


1 


4 


364 


4 


0 


1 


2 


3 


7 




>1GB 


3 


6 


1252 


6 


0 


2 


2 


4 


8 






5 


8 


2160 


9 


0 


2 


7 


5 


9 






25 


14 


>21h 




0 


3 


16 


9 


10 






42 


19 






1 


4 


55 


11 



Table 2. Barrel shifter (|r| = number of registers, sec = seconds, MB = Mega Bytes). 
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SMVi 
sec MB 


SMV2 
sec MB 


SATO 
k = 5 
sec MB 


PROVE 
k = 5 
sec MB 


SATO 
fc= 10 
sec MB 


PROVE 
k= 10 
sec MB 


4 


846 


11 


159 


217 


0 


3 


1 


3 


3 


6 


54 


5 


5 


2166 


15 


530 


703 


0 


4 


2 


3 


9 


8 


95 


5 


6 


4857 


18 


1762 


703 


0 


4 


3 


3 


7 


9 


149 


6 


7 


9985 


24 


6563 


833 


0 


5 


4 


4 


15 


10 


224 


8 


8 


19595 


31 


>1GB 


1 


6 


6 


5 


16 


12 


323 


8 


9 


>10h 








1 


6 


9 


5 


24 


13 


444 


9 


10 










1 


7 


10 


5 


36 


15 


614 


10 


11 










1 


8 


13 


6 


38 


16 


820 


11 


12 










1 


9 


16 


6 


40 


18 


1044 


11 


13 










1 


9 


19 


8 


107 


19 


1317 


12 


14 










1 


10 


22 


8 


70 


21 


1634 


14 


15 










1 


11 


27 


8 


168 


22 


1992 


15 



Table 3. Liveness for one user in the DME (sec = seconds, MB = Mega Bytes). 
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14 


24 
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21 


40 


76 
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5622 
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0 
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9449 
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118 


217 
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0 


2 


9 


segmentation 


172 


220 
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1 


1 


2 


10 
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0 


1 


0 


3 


11 






413 


702 


0 


1 


0 


3 


12 






719 


702 


0 


2 


1 


3 


13 






843 


702 


0 


2 


1 


3 


14 






1060 702 


0 


2 


1 


3 


15 






1429 702 


0 


2 


1 


3 



Table 4. Counterexample for liveness in a buggy DME (sec = seconds, MB = Mega 
Bytes). 
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Abstract. We consider the problem of verifying automatically infinite- 
state systems that are systems of finite machines that communicate by 
exchanging messages through unbounded lossy fifo channels. In a previous 
work [1], we proposed an algorithmic approach based on constructing a 
symbolic representation of the set of reachable configurations of a system 
by means of a class of regular expressions (SREs). The construction 
of such a representation consists of an iterative computation with an 
acceleration technique which enhances the chance of convergence. This 
technique is based on the analysis of the effect of iterating control loops. 
In the work we present here, we experiment our approach and show how it 
can be effectively applied. For that, we developed a tool prototype based 
on the results in [1]. Using this tool, we provide an automatic verification 
of (the parameterized version of) the Bounded Retransmission Protocol. 



1 Introduction 

Communication protocols are naturally modeled as an asynchronous parallel 
composition of finite-state machines that exchange messages through unbounded 
fifo channels. Moreover, in a large class of communication protocols, e.g., link 
protocols, channels are assumed to be lossy in the sense that they can at any 
time lose messages. Then, an important issue is to develop automatic analysis 
techniques for lossy channel systems. 

Many verification problems, e.g., verification of safety properties, reduce to 
computing the set of reachable configurations. However, since lossy channel sys- 
tems are infinite-state systems, this set cannot be constructed by enumerative 
search procedures, and naturally a symbolic approach must be adopted allow- 
ing finite representations of infinite sets of configurations. Moreover, it has been 
shown that there is no algorithm for computing reachability sets of lossy channel 
systems [8]. Then, the approach we adopt is to develop semi-algorithms based 
on a forward iterative computation with a mechanism allowing to enhance the 
chances of convergence. This mechanism is based on accelerating the calculation 
[18,9] by considering meta-transitions [6] corresponding to an arbitrary num- 
ber of executions of control loops: in one step of the iterative computation, we 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 208-222, 1999. 

@ Springer- Verlag Berlin Heidelberg 1999 
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add successors by the transitions of the system as well as all the reachable con- 
figurations by iterating control loops. So, to realize this approach, we need a 
good symbolic representation which should be expressive, and allow efficient per- 
formance of certain operations that are used in the computation of reachability 
sets, e.g., inclusion testing, computing successors by transitions of the system, as 
well as the effect of iterating control loops. In [1], we proposed a new symbolic 
representation formalism based on a class of regular expressions called SREs 
(simple regular expressions) for use in the reachability analysis of lossy channel 
systems. We showed in that work that SREs are good symbolic respresentations: 
we showed that SREs can define the reachability set of any lossy channel system 
(but not effectively in general), and that all the needed operations on SREs are 
rather simple and can be carried out in polynomial time. 

The aim of this paper is to show the power of the approach we adopt and how 
our results in [1] can be effectively applied. Based on these results, we developed 
a tool prototype, called Lcs. Given a lossy channel system, this tool generates 
automatically its set of reachable configurations by means of SREs, and pro- 
duces a symbolic graph which constitutes a finite-state abstract model of the 
system. Furthermore, the tool allows on-the-fiy verification of safety properties 
given by finite-state labelled transition systems. The Lcs tool is connected to the 
Cadp toolbox [11] which provides a variety of procedures on finite-states labelled 
transition systems, e.g., comparison and minimization w.r.t. behavioural equiva- 
lences, model-checking for temporal logics. For instance, it is possible to generate 
automatically a finite abstract model of a system using the Lcs tool, and then 
apply standard finite-state verification techniques on this abstract model. 

We show an interesting experimentation we have done with our tool, which 
consists of an automatic verification of the Bounded Retransmission Protocol 
(BRP) of Philips. The BRP is a data link protocol which can be seen as an ex- 
tended version of the well known alternating bit protocol. It consists of a sender 
and a receiver that communicate through two lossy channels. The service the 
protocol delivers is the transmission of large files seen as sequences of data of 
arbitrary length. In addition, both the sender and receiver must indicate to their 
clients whether the whole file has been delivered successfully or not. The sender 
reads a sequence of data and transmit successively each datum in a separate 
frame following an alternating bit protocol- like procedure. However, the sender 
can resend a non-acknowledged frame up to a fixed number of retransmission 
MAX, which is a parameter of the protocol. Our modeling of the BRP assumes 
that the sizes of the transmitted sequences and the value MAX can be arbitrary 
positive integers. The assumption concerning MAX leads to a model with un- 
bounded channels which represents the whole family of BRPs with any value of 
MAX. This shows an example where the model of unbounded channels allows a 
parametric reasoning about a family of systems. 

We use our Lcs tool to generate automatically the set of reachable config- 
urations of the BRP and the corresponding finite symbolic graph (0.56 seconds 
on UltraSparc). After projecting this graph on the set of external actions of the 
protocol and minimization w.r.t. observational trace equivalence, we get an ab- 
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struct model with 5 states and 10 transitions which corresponds exactly to the 
expected external behaviour of the protocol. 

Related Work: There are several works on symbolic verification of perfect fifo- 
channel systems [20,12,4,5,7]. Pachl proposed to represent the set of reachable 
configurations of a protocol as a recognizable set (carthesian product of regular 
sets), but he gave no procedures for computing such a representation. Finkel 
and Marce proposed a symbolic analysis procedure using a class of regular ex- 
pressions (not comparable with SREs), and which is based on an analysis of the 
unbounded iterability of a control loop [12]. The set of configurations computed 
by this procedure is, however, an upper approximation of the reachability set. 
Boigelot et al. use finite automata (under the name of QDDs) to represent rec- 
ognizable sets of configurations [4,5]. However, QDDs cannot characterize the 
effect of any control loop of a perfect fifo-channel system (restrictions on the 
type of loops are considered in order to preserve recognizability) . To compute 
and represent the effect of any control loop, stuctures called CQDDs combin- 
ing finite automata with linear arithmetical constraints must be used [7]. Our 
work ([1] and this paper) takes advantage from the fact that we are analysing 
specifically lossy channel systems. For these systems, we propose a symbolic rep- 
resentation (SREs) which captures exactly the class of reachability sets of such 
systems. Then, while the operations on QDDs and CQDDs are of exponential 
complexity and are performed by quite non-trivial algorithms, all needed opera- 
tions on SREs can be performed by much simpler algorithms and in polynomial 
time. Moreover, although QDDs and CQDDs are more expressive than SREs, 
the algorithms in [4,5,7] cannot simulate the ones we use on SREs. The reason 
is that lossy transitions are implict in our model, whereas all transitions are 
explicitly represented in the algorithms in [4,5,7]. Thus to simulate in [4,5,7] the 
effect of iteration of a loop in the lossy channel model, we have to add transi- 
tions explicitly to model the losses. These transitions add in general new loops 
to the system, implying that a loop in the lossy channel system is simulated by 
a nested loop in the perfect channel system. However analysis of nested loops is 
not feasible in the approaches of [4,5,7]. 

Several works addressed the specification and verification of the BRP. To 
tackle the problem of unboundedness of the size of the transmitted files and 
the parameter MAX, these works propose proof-based approaches using theorem 
provers, combined with abstraction techniques and model checking. In [14] the 
system and its external specification are described in /iCRL and are proved to be 
(branching) bisimilar. The proof is carried out by hand and checked using Coq. 
An approach based on proving trace inclusion (instead of bisimulation) on I/O 
automata is developed in [17]. In [16] the theorem prover PVS is used to prove 
that the verification of the BRP can be reduced by means of abstraction to a 
finite-state problem that can be solved by model checking. In [13,3] a more auto- 
mated approach is applied based on constructing automatically a finite abstract 
model using PVS, for an explicitely given abstraction function. 

It is possible to see the unbounded lossy channel system we use to model the 
BRP as an abstraction of the whole family of the BRPs for all possible values 
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of its parameters. But this model is infinite- state: the unboundedness of the 
parameters is in some sense transformed into an unboundedness of the channels. 
Then, starting from this infinite-state system, our verification technique is fully 
automatic. It is based on an automatic generation of a finite abstract model, 
without giving explicitly the abstraction relation. So, our work provides a fully 
automatic, and efficient, verification of the (untimed) parameterized version of 
the BRP. 

Finally, we mention two works where the BRP has been verified automatically 
but only for some fixed instances of its parameters: In [19], an untimed version 
of the BRP is verified using both a bisimulation-based approach and a model 
checking approach using Cadp. In [10] a timed version of the BRP is verified 
using the tools Spin and Uppaal. These two works avoid the issue of parameter 
unboundedness and use standard finite-state techniques. However, the work in 
[10] consider timing aspects that we have abstracted since our model is untimed. 
Outline: In Section 2 we define the model of lossy channel systems. In Section 3 
we present the verification approach we adopt. In Section 3.3 we present the class 
of SREs and we overview our results concerning this symbolic representation. In 
Section 4 we describe our tool prototype. In Section 5 we present our modeling 
and verification of the BRP. Concluding remarks are given in Section 6. 

2 Lossy Channel Systems 

We consider system models consiting of asynchronous parallel compositions of 
finite-state machines that communicate through sending and receiving messages 
via a finite set of unbounded lossy fifo channels (in the sense that they can 
nondeterministically lose messages) . 

A Lossy Channel System (LCS) £ is a tuple (S', Smit, C, M, S, d), where 

— S is a finite set of (control) states. The control states of a system with n 
finite-state machines is formed as the Cartesian product S = Si x • • • x S„ 
of the control states of each finite-state machine. 

~ Sinit G S is an initial state, The initial state of a system with n finite-state 
machines is a tuple {smiti, ■ ■ ■ , Sinit„) of initial states of the components. 

— C is a finite set of channels, 

— M is a finite set of messages, 

— A is a finite set of transition (or action) labels, 

— (5 is a finite set of transitions, each of which is of the form (si,^, Op,S2), 
where si and S2 are states, £ G S, and Op is a mapping from C to (channel) 
operations. An operation is either a send operation !a, a receive operation 
la, or an empty operation nop, where a G M. 

A configuration 7 of £ is a pair (s, w) where s G S' is a control state, and 
w is a mapping from C to M* giving the contents of each channel. The initial 
configuration jmit of £ is the pair {smit, e) where e denotes the mapping where 
each channel is assigned the empty sequence e. 

We define a labelled transition relation on configurations in the following 
manner: (si, wi) — > (s2, W2)if and only if there exists a transition(si, £, Op, S2) G 6 
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such that, for each c G C, we have: if Op{c) =!a, then 102(0) = wi(c) • a, and if 
Op(c) =?a, then a ■ 102(0) = wi(o), and if Op(c) = nop, then 102(0) = wi(o). 

Let A denote the subsequence relation on M* . For two mappings w and w' 
from C to M*, we use w ^ w' to denote that w(c) A w'(o) for each c G C. Then, 

we introduce a weak transition relation on configurations in the following manner: 
£ 

(si,wi) {s2,W2} if and only if there are w[ and w'2 such that w[ ^ wi, 
i i 

102 w'27 (si,w[) — > (52,^2). Intuitively, (si,wi) (52,^2) means that 

(32,002) can be reached from (si,wi) by first losing messages from the channels 

and reaching {si,w[), then performing the transition {si,w[) — > (s2,w'2), and 
thereafter losing messages from channels. Given a configuration 7, we let post('j) 
denote the set of immediate successors of 7, i.e., post('j) = {7' : 3£ G S. j 
7'}. The function post is generalized to sets of configurations in the obvious 
manner. Then, we let post* denote the reflexive transitive closure of post, i.e., 
given a set of configurations T, post*(F) is the set of all reachable configurations 
starting from F. Let Reach(C) be the set post*('jmit)- For every control location 
s G S,we define TZ(s) = {w : (s,w) G Reach(C)}. 

A run of C starting from a configuration 7 is a finite or infinite sequence 

P = 7o^o7i^i 72 • ■ • such that 70 = 7 and Vt > 0 . 7^ 7i+i- The trace of the 

run p is the sequence of action labels r = We denote by Traces(C) 

(resp. Traces f(C)) the set of all traces (resp. finite traces) of £ starting from 
the initial configuration 

We introduce two extensions of the basic model given above: the first one 
consists in introducing channel emptiness testing: we use enabling conditions on 
transitions involving a predicate empty on channels telling whether a channel is 
empty. The second extension consists in allowing the components of a system 
to test and set boolean shared variables (remember that we consider here asyn- 
chronous parallel composition following the interleaving semantics). The formal 
semantics of the extended model is an obvious adaptation of the one given above. 

3 Symbolic Reachability Analysis 

We adopt an algorithmic verification approach based on the computation of the 
set of reachable configurations. We explain hereafter the general principle we 
consider in order to compute reachability sets, and how it can be applied to 
solve verification problems. 

3.1 Computing reachability sets 

The basic question is how to construct the set Reach(C) for any given system 
C, or more generally, how to construct the set post*(F) for any given set of 
configurations F of the system. Clearly, post*(F) is the least solution of the 
equation X = F U post(X), and thus, it is the limit of the increasing sequence 
of sets (Aj)j>o where Xq = F and = Xi LI post(Xi). From this fact, one 

can derive an iterative procedure computing the set post*(F) which consists in 
computing the elements of the sequence of the Xi’s until the inclusion Xi+i C Xi 
holds for some index i, which means that the limit is reached. However, since the 
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systems we are interested in have an infinite number of reachable configurations, 
this naive procedure does not terminate in general. Moreover, in the case of lossy 
channel systems, it has been shown that the set Reach{C) cannot be effectively 
constructed although it is recognizable (finite-state automata definable) [8] . 

Hence, since an algorithm to construct the reachability sets does not exist 
in general, we adopt the approach of using semi-algorithms with a mechanism 
allowing to enhance their chance to terminate. This mechanism is based on 
the idea of accelerating fixpoint computations [18,9]. For instance, consider a 
control loop of a lossy channel system that sends a symbol a on a channel, 
initially empty (we mean by control loop a circuit in the graph (S,S)). The 
set of all reachable contents of the channel by iterating this loop is the regular 
language a* . However, the naive procedure given above will compute successively: 
Xq = {e},Xi = {e,a},X 2 = {e, a, a^}, . . ., and never reach the limit. This 
example shows that if we are able to compute the effect of a loop on a set of 
configurations, we can use it to jump to the limit in one step, and help the fixpoint 
computation to converge: Given a control loop 9 and a set of configurations T, let 
post*g{r) be the set of reachable configurations by iterating 6 an arbitrary number 
of times starting from T. Then, if the postg image of any set of configurations 
is effectively constructible, we can consider the loop 6* as a meta-transition of 
the system [6]. This means that at each step of the iterative computation of 
the reachability set, we add immediate successors by original transitions of the 
system as well as successors by meta-transitions. 

To realize this procedure, we need representation structures of sets of con- 
figurations. A good representation structure must allow a finite representation 
of the infinite sets of configurations we are interested in, it should be at least 
effectively closed under union and post, and it must have a decidable inclusion 
problem. Furthermore, this representation structure must allow the computation 
of the effects of control loops. Finally, any reasonable representation structure 
should be “normalizable”, i.e., for every representable set, there is a unique nor- 
mal (or canonical) representation which can be derived from any alternative 
representation (there is a normalization procedure). Indeed, all operations (e.g., 
entailement testing) are often easier to perform on normal forms. Furthermore, 
in many cases normality (canonicity) corresponds to a notion of minimality (e.g. 
for deterministic automata), which is crucial for practical reachability analysis 
procedures. 

3.2 Use in verification 

Verification of invariance properties It consists in checking whether start- 
ing from the initial configuration of the system, a state property (p is always 
satisfied. Clearly, this statement holds if Reach{£) C |(^], where |:^] is the set 
of configurations satisfying ip. Thus, if Reach{C) can be computed using a class 
C of good representation structures, and if |:^] is also effectively representable in 
C, then our problem is solvable (inclusion is decidable for good representations). 
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Automata-based verification of safety properties A regular safety property 
is a set of finite traces over E. Then, the system C satisfies a property II iff 

Traces f{£) C 77 (1) 

Naturally, a regular safety property 77 is represented by a deterministic finite- 
state labelled transition system An ■ This system is completed by adding a special 
state bad to the set of states Q, and adding transitions {q, 7, bad) for every q G Q 
and i G E such that there is no transitions in An starting from q which are 
labelled by £. Let be the so obtained transition system and let C x 
be the synchronous product of C and A^*^. The system C x A^*^ is a lossy 
channel system (with the n channels of C) whose control states are elements of 
S X {QU{bad}). Then, the problem (1) reduces to checking if Reach{C x A'ff'^) C 
S X Q X (M*)" (i.e., bad configurations are never reached). 

It is convenient to consider a safety property II as a set of traces over a set 
of observable actions 12 Q E. Then its verification problem consists in checking 
if Traces f{£)\a C 77, where |i 7 denotes projection on 12 (i.e., hiding all symbols 
except those in 17) . Given Ajf defined as previousely, this problem is equivalent 
to Reach{£ xq A^*^) C S x Q x (M*)", where Xjj is the product of labelled 
transition systems with synchronisation on actions in 17. 

Generation of finite abstractions A C-indexed language W over M is a 
mapping from C to 2^ representing a set of C-indexed sequences such that 
w e IT iff Vc G C. w{c) G IT(c). 

A symbolic state of £ is a pair (j) = (s, W) where s G S' is a control state and 
IT is a C-indexed language over 717. The symbolic state <j) represents the set of 
configurations |^] = {{s,w) : w G IT}. 

Let <7 be a finite set of symbolic states. Then, the symbolic graph associated 

with <7 is the finite-state labelled transition system Q<p such that its set of states 

I i 

is <P and, y(j>i,(j )2 G <7. V7 G E. (fi — > (f )2 iff dyi G <(' 1,72 G (j> 2 - 7 i — > 72 - 
We consider as initial state in Q,p any configuration which contains the initial 
configuration 

In particular, we consider the partition of Reach{£) according to the control 
states, i.e., <Pc = {(s, IT) : s G S and |IT] = 7^(s)|. The labelled transition 
system is called the canonical symbolic graph of £. 

Lemma 1. For every finite set of symbolic states <P, if Reach{£) C If))], 

then simulates £. 

Indeed, it is easy to see that the membership relation, i.e., the relation R such 
that 77 ?^ iff 7 G 1^] , is a simulation relation (using the fact that every reachable 
configuration of £ belongs to at least one symbolic state in <P). 

Clearly, Lemma 1 holds for the canonical symbolic graph of £. This means 
that if Reach{£) can be constructed, we obtain directly a finite-state abstraction 
of the system £. This abstract model can be used to check linear-time properties 
and, if the result is positive, to deduce that the same result holds for the concrete 
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system . More precisely, given an oo-regular linear-time property II, i.e., a set 
of finite or infinite traces over E, a system L satisfies II if Traces{C) C II. 
By Lemma 1, we have Traces{L) C Traces{Q,p^). Hence, for every oo-regular 
property 77, if satisfies 77, then £ satisfies 77 too. 

Notice that if Q,p^ does not satisfy 77, this could be due to the fact that the 
abstraction corresponding to the partition of Reach{£) according to control state 
is too coarse. Then, one could try to check 77 on refinements of this partition. 

3.3 Computing Reachability Sets of LCSs 

We introduced in [1] a new symbolic representation formalism, based on a class 
of regular expressions called SREs (simple regular expressions), for use in the 
calculation of reachability sets of lossy channel systems. We showed in that previ- 
ous work that SREs are “good’ representation structures in the sense introduced 
in Section 3.1. We give hereafter the definition of SREs and a brief overview of 
the results of [1] concerning these representations. 

Definition 2 (SREs). An atomic simple expression over M is a regular expres- 
sion of one of two following forms: (a -I- e), where a € M , or (ai -I- • • • -I- am)* , 
where & M. A simple product p over M is either e (denoting the 

language {e} ) or a concatenation ci • C 2 • • • of atomic simple expressions over 
M . A simple regular expression (SRE) r over M is either 0 (denoting the empty 
language) or a sum pi Pn of simple products over M . Given an SRE r, 

we denote by |r] the language it defines. A language is said to be simply regular 
if it is definable by an SRE. 

A C -indexed SRE R over M is a mapping from C to the set of SREs. The 
expression R defines the C -indexed language L (denoted |7?]j such that, for every 
c G C, L{c) = |7?(c)]. A C -indexed language is said to be simply recognizable if 
it is a finite union of languages definable by C -indexed SREs. 

Any set of configurations 7^ is a union of the form Usesi®} ^ where the 
Ws’s are C-indexed languages over M. We say that E is SRE definable if Wg is 
simply recognizable for each s G S. 

For a lossy channel system £, the set Reach{£) is SRE definable (the set 
R{s) is simply recognizable for each control state s) [1]. This means that SREs 
are expressive enough to represent the reachability set of any lossy channel sys- 
tem. However, as we mentionned before, there is, in general, no algorithm for 
computing a representation of Reach{£) for a lossy channel system £ [8]. 

An entailment relation can be defined on SREs: For SREs r\ and V 2 , we say 
that ri entails r 2 (we write r\ £ T 2 ), if |ri] C |r 2 ]. This relation is extended to 
indexed SREs in the obvious manner. It can be shown that entailment among 
indexed SREs can be checked in quadratic time [1] . 

Definition 3 (Normal form). A simple product ci • • • e„ is said to be normal 
ifWi G {1, . . -,n}. Ci ■ e*+i % e^+i and Ci ■ Cj+i % Ci. An SRE r = pi H 

^ This approach can also be applied for branching-time properties expressed in nni- 
versal positive fragments of temporal logics or /i-calculi like VCTL* [15] or nLjj 
[ 2 ]. 
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is said to be normal i/ Vi G {1, . . . , n}. pi is normal, and \/i,j G {1, . . . , n}. i ^ 
j■P^ %Pj- 

It can be shown that for each SRE r, there is a unique (up to commutativity 
of products) normal SRE, denoted f, such that |f] = |r], and which can be 
derived from r in quadratic time [1] . 

Finally, we can show that, for a lossy channel system C and an SRE repre- 
sentable set of configurations F, the set post{F) is SRE definable and effectively 
constuctible in linear time, and that for any control loop 9 in C, the set poster) 
is also SRE definable and effectively constuctible in quadratic time [1]. 

4 Implementation 

We implemented our techniques in a tool prototype called Lcs. The input of the 
Lcs is a finite set of communicating automata, given seperately. Then, the tool 
allows the following options: 

Generation of the reachability set: The tool allows calling a procedure 
which computes a representation of the reachability set of the system by means 
of (normal) SREs. The computation is done according to a depth-first-search 
strategy, and uses the acceleration principle (see Sections 3 and 3.3): Starting 
from the initial configuration, the procedure explores a graph where nodes are 
symbolic states. When visiting a node, the procedure computes its immediate 
successors using the post function. Whenever a control loop is detected, i.e., the 
current symbolic state has an ancestor with the same control state, the effect 
of iterating this loop is computed, leading to a new symbolic state. Notice that 
the loops used for acceleration are found on-the-fly and are not explicitly given 
by the user. The set of reachable configurations is memorized progressively. If a 
visited node (symbolic state) is included in the set of reachable configurations 
computed so far, the successors of the node are not generated. Otherwise, its set 
of configurations is added to the current set of reachable configurations, and the 
search continues. 

Generation of the canonical symbolic graph: During the computation the 
reachability set, the Lcs tool can construct the corresponding canonical symbolic 
graph (transitions between symbolic states). 

The symbolic graph is produced in the input format of the Cadp toolbox 
(Caesar/ Aldebaran Development Package) [11] which contains several tools 
on finite-state labelled transition systems, e.g., graphical visualisation, compar- 
ison with respect to various behavioural equivalences and preorders like obser- 
vational bisimulation and simulation, minimization, on-the-fly automata-based 
verification, model-checking for an ACTL-like temporal logic (action-based vari- 
ant of CTL) and the alternation-free modal /i-calculus. 

On-the-fly checking of safety properties: Given a safety property described 
as a deterministic labelled transition system over a set observable actions C S, 
the tool checks whether the projection of the system on Q satisfies U. This 
verification (based on a reachability set generation, see Section 3.2) is done on- 
the-fly: the procedure stops as soon as a bad configuration is encountered. 
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5 The Bounded Retransmission Protocol 

5.1 Specification of the service 

The Bounded Retransmission Protocol (BRP for short) is a data link protocol. 
The service it delivers is to transmit large files (sequences of data of arbitrary 
lengths) from one client to another one. Each datum is transferred in a separate 
frame. Both clients, the sender and the receiver, obtain an indication whether 
the whole file has been delivered successfully or not. 

More precisely, at the sender side, the protocol requests a sequence of data 
s = di,...,dn (action REQ) and communicates a confirmation which can be 
SOK, SNOK, or SDNK. The confirmation SOK means that the file has been 
transferred successfully, SNOK means that the file has not been transferred com- 
pletely, and SDNK means that the file may not have been transferred completely. 
This occurs when the last datum is sent but not acknowledged. Now, at the 
receiver side, the protocol delivers each correctly received datum with an indi- 
cation which can be REST, RING, or ROK. The indication REST means that the 
delivered datum is the first one and more data will follow, RING means that the 
datum is an intermediate one, and ROK means that this was the last datum and 
the file is completed. However, when the connection with the sender is broken, 
an indication RNOK is delivered (without datum). Properties the service must 
satisfy are: 

1. a request REQ must be followed by a confirmation (SOK, SNOK, or SDNK) 
before the next request, 

2. a REST indication (delivery of the first datum) must be followed by one of the 
two indications ROK or RNOK before the beginning of a new transmission 
(next request of the sender), 

3. a SOK confirmation must be preceded by a ROK indication, 

4. a ROK indication can be followed by either a SOK or a SDNK confirmation, 
but never by a SNOK (before next request), 

5. a RNOK indication must be preceded by SNOK or SDNK (abortion), 

6. if the first datum has been received (with the REST indication), then a SNOK 
confirmation is followed by a RNOK indication before the next request. 

5.2 Description of the protocol 

The BRP consists of two processes, the sender S and the receiver R, that 
communicate through two lossy fifo channels K and L: messages can either 
be lost or arrive in the same order in which they are sent. The BRP can be 
seen as an extended version of the alternating bit protocol. Messages sent from 
the sender S to the receiver R through the channel K are frames of the form 
{first, last, toggle, datum) where a datum is accompanied by three bits: first 
and last indicate whether the datum is the first or the last one of the considered 
file, toggle is the alternating bit allowing to detect duplications of intermediate 
frames. As for the acknowledgments (sent from R to S through L), they are 
frames of the form {first, last, toggle). Notice that in the description we con- 
sider of the BRP, the value of toggle is relevant only for intermediary frames. 
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Indeed, the first and last frames can be distinguished from the intermediary ones 
using the booleans first and last. 

The behaviours of S and R are the following: The sender S starts by reading 
(action REQ) a sequence s = di, . . . , d„. We consider here that n > 2, the case 
n = 1 does not introduce any difficulty. Then, S sends to R through K the first 
data frame (1, 0, 0, di), and waits for the acknowledgement. Let us consider first 
the ideal case where frames are never lost. When R receives the frame from K, 
it delivers to its client the datum di with the indication REST, and sends to 
S an acknowledgement frame (1,0,0) through the channel L. When S receives 
this acknowledgement, it transmits to R the second frame (0,0,0,d2) {toggle is 
still equal to 0 since its value is relevant for intermediate frames). Then, after 
reception, R delivers d,2 with the indication RING and sends the acknowledgement 
(0,0,0) to S. Then, the next frame sent by S is (0,0,l,ds) (now toggle has 
flipped), and the same procedure is repeated until the last frame (0, 1,— ,d„) is 
sent (here again, like in the case of the first frame, the value of toggle is not 
relevant). When R receives the last frame, it delivers d„ with the indication 
ROK, and acknowledges receipt. Then, the sender S communicates to its client 
the confirmation SOK meaning that the whole sequence s has been successfully 
transmitted. 

Now, let us consider the case where frames are lost. When S send a data 
and realizes that it may be lost (a timer Tg expires and it did not receive a 
corresponding acknowledgement from R), it retransmits the same frame and 
waits again for the acknowledgement. However, it can try only up to a fixed 
maximal number of retransmissions MAX which is a parameter of the protocol. 
So, the sender maintains a counter of retransmissions CR, and when CR reaches 
the value MAX, it gives up and concludes that the connection with the receiver 
is broken. Then, it informs its client that a failure occured by communicating 
one of the two confirmations: SNOK if the frame in consideration is not the last 
frame of the sequence, or SDNK if it is the last one (the sender cannot know 
if the frame was lost or if its acknowledgement was lost). On the other side, 
the receiver R uses also a timer Tj. to measure the time elapsed between the 
arrival of two different frames. When R receives a new frame, it resets and, 
it delivers the transmitted datum with the corresponding indication, otherwise 
it resends the last acknowledgement. If the timer expires, it concludes that the 
connection with the sender is broken and delivers an indication RNOK meaning 
that the transmission failed. Notice that if the first frame is continuously lost, 
the receiver has no way to detect that the sender is trying to start a new file 
transmission. In addition, two assumptions are made on the behaviour of S and 
R: 

A1 R must not conclude prematurely that the connection with S is broken. 

A2 In case of abortion, S cannot start transmitting frames of another file until 

R has reacted to abortion and informed its client. 

Assumption A1 means that Tj. must be large enough to allow MAX retransmis- 
sions of a frame. Assumption A2 can be implemented for instance by imposing 
to S to wait enough time after abortion to be sure that has expired. 
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5.3 Modeling the BRP as a Lossy Channel System 

We model the BRP as a lossy channel system which consists of two communicat- 
ing finite-state machines, the sender S and the receiver R represented in Figures 
1 and 2 (with obvious notational conventions). For that, we proceed as follows: 




Opi = -irtrans A empty(K) A empty(L) i— » abort ;= false 
Op2 = empty{L) i— » abort := true 



Fig. 1. The sender S 



Frames: Since the control of the BRP does not depend on the transmitted 
data, we hide their values and consider only the informations (first, last, toggle). 
The set of relevant informations of such form corresponds to a finite alphabet 
M = {fst, last, 0, 1}, where fst (resp. last) represents the first (resp. last) frame, 
and 0 and 1 represents the intermediate frames since only toggle is relevant in 
this case. 

The number of transmitted frames: Only is relevant whether a frame is the first 
one, the last one, or an intermediate one, we abstract from the actual value n 
corresponding to the size of the transmitted sequence of frames, and consider 
that it can be any positive integer, chosen nondeterministically (by the sender). 





220 Parosh Abdulla, Aurore Annichini, and Ahmed Bouajjani 




Fig. 2. The receiver R 



Time-outs: Since our model is untimed, we cannot express time-outs explic- 
itly. Then, we consider that the sender and the receiver decide nondeterminis- 
tically when time-outs occur, provided that their corresponding input channels 
are empty (we use channel emptiness testing). 

The counter CR and the value MAX.' Only is relevant whether CR < MAX or CR 
> MAX. Then, we consider that the sender can resend frames an arbitrary num- 
ber of times before considering that MAX is reached and deciding the abortion 
of the transmission. This makes the size of the channels K and L unbounded. 
Our model is an abstraction of the whole family of BRPs for arbitrary values of 
MAX. 

Assumptions A1 and A2: Again, since our model is untimed, we cannot im- 
pose real-time constraints to implement the assumptions A1 and A2. Then, 
we use boolean shared variables to synchronise the sender and the receiver. We 
consider the two following variables: abort which tells whether the sender has 
decided abortion, and rtrans which tells whether the receiver considers that the 
transmission of a sequence of frames has started and is not finished yet, i.e., 
from the moment it receives the first frame until it informs its client that the 
transmission is terminated, either successfully or not. 





Symbolic Verification of Lossy Channel Systems 221 



5.4 Verifying the Bounded Retransmission Protocol 

To verify the BRP, we follow the following steps: First, we use our Lcs tool to 
generate automatically the set of reachable configurations of the BRP and the 
corresponding canonical symbolic graph. The obtained graph has 24 symbolic 
states and 61 transitions. The execution time is 0.56 seconds (UltraSparc). 

Then, we use the tool Aldebaran to minimize this graph according to the 
observational trace equivalence where the set of observable actions is {REQ, SOK, 
SNOK, SDNK, REST, RING, ROK, RNOK}. We obtain the finite-state labelled 
transition system with 5 states and 10 transitions given in Figure 3. Properties 



RING 




Fig. 3. The minimized symbolic graph of the BRP 



such as those given in Section 5.1 are expressible in ACTL (the action-based 
variant of CTL) and can be automatically model checked on the obtained finite- 
state abstract model of the BRP. 

6 Conclusion 

We have presented a symbolic approach for verifying automatically a class of 
infinite-state systems: the class of unbounded lossy channel systems. This ap- 
proach is based on a procedure of constructing the set of reachable configurations 
of the system by means of a symbolic representation (SREs), and acceleration 
techniques based on the analysis of the effect of control loops. In addition to 
the generation of the reachability set of a system, we showed that this approach 
allows the automatic generation of a finite abstract model of the system which 
can be used for checking various properties by means of standard finite-state 
verification methods. 

We applied this approach to the non-trivial example of the BRP. We showed 
that considering unbounded channels allows parametric reasoning: unbounded- 
ness of the channels models the fact that the number of retransmissions can be 
any arbitrary positive integer. Our experimentation with the Lcs tool show that 
the algorithmic approach we adopt is quite effective. For a first prototype, we 
obtained quite satisfactory performances. 
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Abstract. We show that Constraint Logic Programming (CLP) can 
serve as a conceptual basis and as a practical implementation platform 
for the model checking of infinite-state systems. Our contributions are: 
(1) a semantics-preserving translation of concurrent systems into CLP 
programs, (2) a method for verifying safety and liveness properties on the 
CLP programs produced by the translation. We have implemented the 
method in a CLP system and verified well-known examples of infinite- 
state programs over integers, using here linear constraints as opposed to 
Presburger arithmetic as in previous solutions. 



1 Introduction 

Automated verification methods can today be applied to practical sys- 
tems [McM93]. One reason for this success is that implicit representations of 
finite sets of states through Boolean formulas can be handled efficiently via 
BDD’s [BCM+90]. The finiteness is an inherent restriction here. Many systems, 
however, operate on data values from an infinite domain and are intrinsically 
infinite-state; i.e., one cannot produce a finite-state model without abstracting 
away crucial properties. There has been much recent effort in verifying such sys- 
tems (see e.g. [ACJT96,BW98,BGP97,CJ98,HHWT97,HPR97,LPY97,SKR98]). 
One important research goal is to find appropriate data structures for implicit 
representations of infinite sets of states, and design model checking algorithms 
that perform well on practical examples. 

It is obvious that the metaphor of constraints is useful, if not unavoidable for 
the implicit representation of sets of states (simply because constraints represent 
a relation and states are tuples of values) . The question is whether and how the 
concepts and the systems for programming over constraints as first-class data 
structures (see e.g. [Pod94, Wal96] ) can be used for the verification of infinite- 
state systems. The work reported in this paper investigates Gonstraint Logic 
Programming (see [JM94]) as a conceptual basis and as a practical implementa- 
tion platform for model checking. 

We present a translation from concurrent systems with infinite state spaces 
to GLP programs that preserves the semantics in terms of transition sequences. 
The formalism of ‘concurrent systems’ is a widely-used guarded-command speci- 
fication language with shared variables promoted by Shankar [Sha93] . Using this 
translation, we exhibit the connection between states and ground atoms, between 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 223-239, 1999. 
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sets of states and constrained facts, between the pre-condition operator and the 
logical consequence operator of CLP programs, and, finally, between CTL prop- 
erties (safety, liveness) and model-theoretic or denotational program semantics. 
This connection suggests a natural approach to model checking for infinite-state 
systems using CLP. We explore the potential of this approach practically by 
using one of the existing CLP systems with different constraint domains as an 
implementation platform. We have implemented an algorithm to compute fix- 
points for CLP programs using constraint solvers over reals and Booleans. The 
implementation amounts to a simple and direct form of meta-programming: the 
input is itself a CLP program; constraints are syntactic objects that are passed 
to and from the built-in constraint solver; the fixpoint iteration is a source-to- 
source transformation for CLP programs. 

We have obtained experimental results for several examples of infinite-state 
programs; these examples are quickly becoming benchmarks in the community 
(see e.g. [BGP97,BGP98,SKR98,SUM96,LS97]). Our experiments allow us to see 
that a CLP-based tool can solve the considered verification problems at accept- 
able time cost. Moreover, as CLP combines mathematical and logical reasoning, 
the CLP-based setting helps to find optimizations that are natural, directly im- 
plementable and provably correct. This is important since verification is a hard 
problem (undecidable in the general infinite-state case) and often requires a 
fine-tuning of the method. 

Finally, the experiments show that, perhaps surprisingly, the powerful (triple- 
exponential time) decision procedure for Presburger Arithmetic used in other 
approaches [BGP98,SKR98,BW94] for the same verification problems is not 
needed; instead, the (polynomial-time) consistency and entailment tests for lin- 
ear arithmetic constraints (without disjunction) that are provided by CLP sys- 
tems are sufficient. 

2 Translating Concnrrent Systems into CLP 

We take the bakery algorithm (see [BGP97]) as an example of a concurrent 
program, using the notation of [MP95]: 

begin turn\ := 0; turn 2 ■= 0; Pi || P 2 end 

where Pi || P 2 is the parallel execution of the subprograms Pi and P 2 , and Pi 
is defined by: 



repeat 

think : turni := turn 2 + 1; 

wait : when turni < turn 2 or turn 2 = 0 do 

critical section; 

use : „ 

turni := 0 

forever 



and P 2 is defined symmetrically. The algorithm ensures the mutual exclusion 
property (at most one of two processes is in the critical section at every point 
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of time). The integer values of the two variables turni and turn 2 in reachable 
states are unbounded; note that a process can enter wait before the other one 
has reset its counter to 0. 

The concurrent program above can be directly encoded as the concurrent 
system S in Figure 1 following the scheme in [Sha93]. Each process is associated 
with a control variable ranging over the control locations (i.e. program labels). 
The data variables correspond to the program variables. The states of S are 
tuples of control and data values, e.g. {think, think, 0,3). The primed version 
of a variable in an action stands for its successor value. We omit conjuncts like 
P 2 = P 2 expressing that the value remains unchanged. 



Control variables pi,p2 ■ {think, wait, use} 

Data variables turni, turn 2 : int. 

Intial condition pi = think A P2 = think A turni = turn2 = 0 
Events cond pi = think action p{ = wait A turni=turn2 + l 

cond Pi = wait A turni < turn2 action p{ = use 

cond Pi = wait A turn2 = 0 action p) = use 

cond Pi = use action p{ = think A turn) = 0 

. . . symmetrically for Process 2 

Fig. 1. Concurrent system S specifying the bakery algorithm 



Following the scheme proposed in this paper, we translate the concurrent system 
for the bakery algorithm into the CLP program shown in Figure 2 (here, p is 
a dummy predicate symbol, think, wait, and use are constants, and variables 
are capitalized; note that we often separate conjuncts by commas instead of 
using “A”). 



init ^ Turni — 0 ,Turn 2 = 0 , p(think, think, Turni, Turn2) 
p{think, P2, Turni, Turn2) ^ Turn[ — Turn2 + 1 , p{wait, P2,Turn'i,Turn2) 
p{wait, P2, Turni, Turn2) ^ Turni < Turn2, p{use, P2, Turni, Turn2) 
p{wait, P2, Turni, Turn2) Turn2 ~ 0 , p{use, P2, Turni, Turn2) 

p{use,P2,Turni,Turn2) Turn) — 0 , p{think, P2,Turn[,Turn2) 

. . . symmetrically for Process 2 

Fig. 2. CLP program Pg for the concurrent system S in Figure 1. 



If the reader is not familiar with CLP, the following is all one needs to know for 
this paper. ^ A CLP program is a logical formula, namely a universally quantified 

^ If the reader is familiar with CLP, note that we are proposing a paradigm shift: 
instead of looking at the synthesis of operational behavior from programs viewed as 
executable specifications, we are interested in the analysis of operational behavior 
through CLP programs obtained by a translation. The classical correspondence be- 
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conjunction of implications (as in Figure 2; it is common to call the implications 
clauses and to write their conjunction as a set). Its first reading is the usual 
first-order logic semantics. We give it a second reading as a non-deterministic 
sequential program. The program states are atoms, i.e., applications of the pred- 
icate p to values such as p{think, think, 0, 3). The successor state of a state s is 
any atom s' such that the atom s is a direct logical consequence of the atom s' 
under the program formula. This again is the case if and only if the implication 
s ^ s' is an instance of one of the implications. 

For example, the state p {think, think, Q,^) has as a possible successor the 
state p{wait, think, 4,3), since p {think, think, 0,3) ^ p{w ait, think, 4,3) is an 
instance of the first implication for p (instantiate the variables with P 2 = think, 
Turni = 0, Turn'^ = 4 and Turu 2 = 3). 

A sequence of atoms such that each atom is a direct logical consequence of 
its successor in the sequence (i.e., a transition sequence of program states) is 
called a ground derivation of the CLP program. 

In the following, we will always implicitly identify a state of a concur- 
rent system S with the corresponding atom of the CLP program P5; 
for example, {think, think, 0, 3) with p{think, think, 0, 3). 

We observe that the transition sequences of the concurrent system S in Fig- 
ure 1 are exactly the ground derivations of the CLP program Ps in Figure 2. 
Moreover, the set of all predecessor states of a set of states in S is the set of its 
direct logical consequences under the CLP program Ps- We will show that these 
facts are generally true and use them to characterize CTL properties in terms 
of the denotational (fixpoint) semantics associated with CLP programs. 

We will now formalize the connection between concurrent systems and CLP 
programs. We assume that for each variable x there exists another variable x' , 
the primed version of x. We write x for the tuple of variables (a;i, . . . , Xn) and d 
for the tuple of values {d\, . . . , dn)- We denote validity of a first-order formula ip 
wrt. to a structure T> and an assignment a by P, a |= V'- usual, a[x 1 — > d] 
denotes an assignment in which the variables in x are mapped to the values in d. 
In the examples of Section 5 formulas will be interpreted over the domains of 
integers and reals. Note however that the following presentation is given for any 
structure T>. 

A concurrent system (in the sense of [Sha93]) is a triple (V,0,S) such that 

— y is the tuple x of control and data variables, 

— 6> is a formula over V called the initial condition, 

— 5 is a set of pairs {ip, (p) called events, where the enabling condition ip is a, for- 
mula over V and the action (pis a formula of the form = Ci A . . . = e„ 

with expressions ei, . . . , e„ over V . 

The primed variable x' appearing in an action is used to represent the value 
of X after the execution of an event. In the examples, we use the notation 
cond Ip action <p for the event {ip, (p) (omitting conjuncts of the form x' = x). 

tween denotational semantics and operational semantics (for ground derivations) is 
central again. 
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The semantics of the concurrent system S is defined as a transition system 
whose states are tuples d of values in T> and the transition relation r is defined by 

r = {{d, d') I V, a[x d] \= ij}, T>, a[x ^ d,x' ^ d'] \= (j), {tp, (p) e E}. 

The pre-condition operator pres of the concurrent system S is defined through 
the transition relation: pres{S) = {d \ exists d' G S such that (d, d') G t}. 

For the translation to CLP programs, we view the formulas for the enabling 
condition and the action as constraints over the structure V (see [JM94]). We 
introduce p for a dummy predicate symbol with arity n, and init for a predicate 
with arity 0. ^ 

Definition 1 (Translation of concurrent systems to CLP programs) 

The concurrent program S is encoded as the CLP program Ps given below, if 
S = {V,0,£) and V is the tuple of variables x. 

Ps = {p{x) <— ^/> A 0 A p{x') I {tp, (p) G £} U {init ^ 0 A p{x)} 

The direct consequence operator Tp associated with a CLP program P 
(see [JM94]) is a function defined as follows: applied to a set S of atoms, it 
yields the set of all atoms that are direct logical consequences of atoms in S 
under the formula P. Formally, 

Tp{S) = {p{d) I p{d) ^ p{d') is an instance of a clause in P, p{d') G S'}. 

We obtain a (ground) instance by replacing all variables with values. In the next 
statement we make implicit use of our convention of identifying states d and 
atoms p{d). 

Theorem 1 (Adequacy of the translation S ^ Ps) 

(i) The state sequences of the transition system defined by the concurrent sys- 
tem S are exactly the ground derivations of the CLP program Ps ■ 

(ii) The pre-condition operator ofS is the logical consequence operator associated 
with Ps, formally: pres = Tps- 



Proof. The clause p{x) ^ ip A <p A p{x') of Ps corresponds to the event {ip, (p). 
Its instances are of the form p{d) ^ p{d') where T>, a[x ^ d,x' ^ d'] \= ip A (p. 
Thus, they correspond directly to the pairs (d, d') of the transition relation r 
restricted to the event {ip,(p). This fact can be used to show (i) by induction 
on the length of a sequence of transitions or derivations and (ii) directly by 
definition. □ 

^ Note that e.g. p{think, P2,Turni,Turn2) <— . . . in the notation used in examples 
is equivalent to p{Pi, P2,Turni,Turn2) ^ Pi — think A .. . in the notation used 
in formal statements. 
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As an aside, if we translate S into the CLP program where 

= {p{x) A Ip A (j) p{x') I {ip,(j))eS} U {0 ^ p{x)} 

then the post-condition operator is the logical consequence operator associated 
with Ps, formally: postg = Tppost We thus obtain the characterization of the set 
of reachable states as the least fixpoint of Tppo.t . 

3 Expressing CTL Properties in CLP 

We will use the temporal connectives: EF (exists finally), EG (exists globally), 
AF (always finally), AG (always globally) of CTL (Computation Tree Logic) to 
express safety and liveness properties of transition systems. Following [Eme90], 
we identify a temporal property with the set of states satisfying it. 

In the following, the notion of constrained facts will be important. A con- 
strained fact is a clause p{x) ^ c whose body contains only a constraint c. Note 
that an instance of a constrained fact is (equivalent to) a clause of the form 
p{d) ^ true which is the same as the atom p{d), i.e. it is a state. Given a set of 
constrained facts F, we write [F]-p for the set of instances of clauses in F (also 
called the ‘meaning of F’ or the ‘set of states represented by F’). For example, 
the meaning of 

Fmut = {p{Pi, P 2 ,Turni,Turn 2 ) ^ P\ = use, P 2 = use} 

is the set of states [Fmut]v = {p{use, use, 0, 0),p(use, use, 1, 0), . . .}. 

The application of a CTL operator on a set of constrained facts F is defined 
in terms of the meaning of F. For example, EF{F) is the set of all states from 
which a state in [F]x> is reachable. In our examples, we will use a more intuitive 
notation and write e.g. EF{pi = P 2 = use) instead of EF{Fmut)- 

As an example of a safety property, consider mutual exclusion for the con- 
current system S in Figure 1 (“the two processes are never in the critical section 
at the same time”), expressed by AG{~^ (pi = P 2 = use)). Its complement is the 
set of states EF{pi = p 2 = use). As we can prove, this set is equal to the least 
fixpoint for the program Pg 0 Fmut that we obtain from the union of the CLP 
Program Ps in Figure 2 and the singleton set of constrained facts Fmut- We can 
compute this fixpoint and show that it does not contain the initial state (i.e. the 
atom init). 

As an example of a liveness property, starvation freedom for Process 1 
(“each time Process 1 waits, it will finally enter the critical section”) is ex- 
pressed by AG{pi = wait — > AF{pi = use)). Its complement is the set of states 
EF{pi = wait A EG{^pi = use)). The set of states EG{^pi = use) is equal to 
the greatest fixpoint for the CLP program Ps 0 Fstarv in Figure 3. We obtain 
Ps 0 Fstarv from the CLP Program Ps by a transformation wrt. to the following 
set of two constrained facts: 

Fstarv = { p{Pi,P 2 ,Turni,Turn 2 ) ^ P\ = think, 

p{Pi,P 2 ,Turni,Turn 2 ) ^ Pi = wait }. 
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init <— Turrii = 0 ,Turn 2 — 0 , 
p{think, P2,Turni,Turn2) <— 
p{wait, P2,Turni,Turn2) <— 

p{wait, P2,Turni,Turn2) <— 

p{wait, think, T urni , T Mm2) ^ 
p{wait,wait,Turn\,Turn2) ^ 
p{wait,wait,Turn\,Turn2) <— 
p{wait,use,Turn\,Turn2) <— 
p{think, think, Turn\,Turn2) <— 
p{think, wait, T urni , T Mm2) ^ 
p{think,wait,Turni,Turn2) ^ 
p{think,use,Turni,Turn2) ^ 



p{think, think, Turn\,Turn2) 
Turn'i=Turn2+l, p{wait, P2,Turn'^,Turn2) 
Turn\ <Turn2, p{use, P2,Turni,Turn2) 
Turn 2 — 0 , p{use,P2,Turni,Turn2) 

Turn'2=Turni + l, p{wait, wait,Turni,Turn'2) 
Turn2 <Turn\, p{wait,use,Turni,Turn2) 

T urni = 0 , p{wait, use, T urni , T Mm2) 

Turn'2 = 0 , piwait, think, Turni,Turn2) 

Turn2~Turni + l, pithink, wait, Turni , Turn'2) 
Turn2 <Turm, p{think,use,Turm,Turn2) 
Turm = 0 , p{think,use,Turm,Turn2) 

Turn'2 = 0 , pithink, think, Turni, Turn'2) 



Fig. 3. The CLP program P 5 0 Fstarv for the concurrent system S in Figure 1. 



The transformation amounts to ‘constrain’ all clauses p(fo6eli, _, _, _) . in 

Ps such that labeli is either wait or think (i.e., clauses of the form 
p{use, ^ . are removed). 

To give an idea about the model checking method that we will describe 
in the next section: in an intermediate step, the method computes a set F' 
of constrained facts such that the set of states [F']-u is equal to the greatest 
fixpoint for the CLP program Ps 0 F. The method uses the set F' to form a 
third CLP program Ps ® F' . The least fixpoint for that program is equal to 
EF{pi = wait A EG{^pi = use)). For more details, see Corollary 21 below. 

We will now formalize the general setting. 

Definition 2 Given a CLP program P and a set of constrained facts F , we 
define the CLP programs P (B F and P (d F as follows. 

P(BF = PUF 

P (d F = {p{x) ^ Cl A C 2 A p{x') \ p{x) ^ Cl A p(x') e P, p(£c) <— C 2 &F} 

Theorem 2 (CTL properties and CLP program semantics) 

Given a concurrent system S and its translation to the CLP program Ps, 
the following properties hold for all sets of constrained facts F. 

EF{F) = //p(TpeF) 

EG{F) = gfpiTp 0 p) 

Proof. Follows from the fixpoint characterizations of CTL properties 
(see [Eme90]) and Theorem 1. □ 

By duality, we have that AF{-^ F) is the complement of gfp{Tp 0 F) and AlG(^ F) 
is the complement of lfp{Tp 0 p). We next single out two important CTL prop- 
erties that we have used in the examples in order to express mutual exclusion 
and absence of individual starvation, respectively. 
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Corollary 21 (Safety and Liveness) 

(i) The concurrent system S satisfies the safety property AG{~^F) if and only 
if the atom ‘init’ is not in the least fixpoint for the CLP program Ps 0 F. 
(ii) S satisfies the liveness property AG{Fi — > AF{-^F 2 )) if and only ‘init’ is 
not in the least fixpoint for the CLP program Pg 0 (J^i A F'), where F' is a 
set of constrained facts denoting the greatest fixpoint for the CLP program 
Ps (d F 2 - 



For the constraints considered in the examples, the sets of constrained facts are 
effectively closed under negation (denoting complement). Conjunction (denoting 
intersection) can always be implemented as F A F' = {p{x) ^ ci A C 2 | p{x) ^ 
Cl G F, p{x) ^ C 2 G F', Cl A C 2 is satisfiable in T>}. 

4 Defining a Model Checking Method 

It is important to note that temporal properties are undecidable for the general 
class of concurrent systems that we consider. Thus, the best we can hope for 
are ‘good’ semi-algorithms, in the sense of Wolper in [BW98]: “the determining 
factor will be how often they succeed on the instances for which verification is 
indeed needed” (which is, in fact, similar to the situation for most decidable 
verification problems [BW98]). 

A set F of constrained facts is an implicit representation of the (possibly 
infinite) set of states S' if S' = [F’J-d. From now on, we always assume that F itself 
is finite. We will replace the operator Tp over sets of atoms (i.e. states) by the 
operator Sp over sets of constrained facts, whose application Sp{F) is effectively 
computable. If the CLP programs P is an encoding of a concurrent system, we 
can define Sp as follows (note that F is closed under renaming of variables since 
clauses are implicitly universally quantified; i.e., if p{x\, . . . ,x„) ^ c G F then 
also p{x [, . . . ,x'„) ^ c[x[/xi, . . .,x'^/xn] G F). 

Sp{F) = {p{x) ^ Cl Ac 2 I p{x) ^ Cl Ap{x') G P, 

p{x') ^ C2 GF, 

Cl A C 2 is satisfiable in T>} 

If P contains also constrained facts p{x) <— c, then these are always contained 
in Sp(F). 

The Sp operator has been introduced to study the non-ground semantics of 
CLP programs in [GDL95] , where also its connection to the ground semantics is 
investigated: the set of ground instances of a fixpoint of the Sp operator is the 
corresponding fixpoint of the Tp operator, formally lfp(Tp) = [lfp{Sp)]v and 
gfp{Tp) = [gfp{Sp)]v. Thus, Theorem 2 leads to the characterization of CTL 
properties through the Sp operator via: 

EF{F) = [Z/p(Sp0p)]p, 

EG{F) = [gfp{Sp^p)]p. 
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Now, a (possibly non-terminating) model checker can be defined in a straight- 
forward way. It consists of the manipulation of constrained facts as implicit 
representations of (in general, infinite) sets of states. It is based on standard 
fixpoint iteration oi Sp operators for the specific programs P according to the 
fixpoint definition of the CTL properties to be computed (see e.g. Corollary 21). 
An iteration starts either with F = % representing the empty set of states, or 
with F = {p{x) ^ true} representing the set of all states. The computation of 
the application of the Sp operator on a set of constrained facts F consists in 
scanning all pairs of clauses in P and constrained facts in F and checking the 
satisfiability of constraints; it produces a new (finite) set of constrained facts. 

The iteration yields a (possibly infinite) sequence Tq, .^i, ^ 2 , • • • of sets of 
constrained facts. The iteration stops at i if the sets of states represented by Fi 
and Fipi are equal, formally [TiJ-p = [Tj+i]p. 

The fixpoint of the Sp operator is taken wrt. the subsumption ordering be- 
tween sets of constrained facts. We say that F is subsumed by F' if the set 
of states represented by F is contained in the set of states represented by F', 
formally [F]xi C [F']xi. Testing subsumption amounts to testing entailment of 
disjunctions of constraints by constraints. 

We interleave the least fixpoint iteration with the test of membership of the 
state init in the intermediate results; this yields a semi-algorithm for safety 
properties. 

We next describe some optimizations that have shown to be useful in our ex- 
periments (described in the next section) . Our point here is to demonstrate that 
the CLP setting, with its combination of mathematical and logical reasoning, 
allows one to find these optimizations naturally. 

Local subsumption. For practical reasons, one may consider replacing sub- 
sumption by local subsumption as the fixpoint test. We say that F is locally 
subsumed by F' if every constrained fact in F is subsumed by some constrained 
fact in F'. Testing local subsumption amounts to testing entailment between 
quadratically many combinations of constraints. Generally, the fixpoint test may 
become strictly weaker but is more efficient, practically (an optimized entailment 
test for constraints is available in all modern CLP systems) and theoretically. 
For linear arithmetic constraints, for example, subsumption is prohibitively hard 
(co-NP [Sri93]) and local subsumption is polynomial [Sri93]. An abstract study 
of the complexity of local vs. full subsumption based on CLP techniques can be 
found in [Mah95]; he shows that (full) subsumption is co-NP-hard unless it is 
equivalent to local subsumption. 

Elimination of redundant facts. We call a set of constrained facts F irredun- 
dant if no element subsumes another one. We keep all sets of constrained facts 
Fi, F 2 , . . . during the least fixpoint iteration irredundant by checking whether a 
new constrained fact in Fi+i that is not locally subsumed by Fi itself subsumes 
(and thus makes redundant) a constrained fact in Fj. This technique is standard 
in CLP fixpoint computations [MR89]. 

Strategies. We obtain different fixpoint evaluation strategies (essentially, 
mixed forms of backward and forward analysis) by applying transformations 
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such as the magic-sets templates algorithm [RSS92] to the CLP programs PgtBF. 
Such transformations are natural in the context of CLP programs which may 
also be viewed as constraint data bases (see [RSS92,Rev93]). 

The application of a kind of magic-set transformation on the CLP program 
P = Ps®F, where the clauses have a restricted form (one or no predicate in the 
body), yields the following CLP program P (with new predicates p and init). 

P = {p{x) ^ body,p{x') I p{x) ^ body S P} U 
{p(a^0 ^ c,p{x) I p{x) ^ c,p{x') G P} U 
{init ^ true} 

We obtain the soundness of this transformation wrt. the verification of safety 
properties by standard results [RSS92] which say that init G lfp{Tp) if and 
only if init G lfp{Tp) (which is, init G lfp{Sp)). The soundness continues 
to hold if we replace the constraints c in the clauses p{x') <— c,p{x) in P by 
constraints that are entailed by c. We thus obtain a whole spectrum of trans- 
formations through the different possibilities to weaken constraints. In our ex- 
ample, if we weaken the arithmetical constraints by true, then the first iterations 
amount to eliminating constrained facts p(labeli,label 2 , ^ whose loca- 
tions {)labeli,label 2 are “definitely” not reachable from the initial state. 

Abstraction. We define an approximation Sp of the Sp operator in the style 
of the abstract interpretation framework, whose results guarantee that we obtain 
conservative approximations of the fixpoints and, hence, of the CTL properties. 
This approximation turns our method into a (possibly non-terminating) semi- 
test for AF and AG properties, in the following direction: only a positive answer 
is a definite answer. 

We introduce a new widening operator fl- (in the style of [CH78], but without a 
termination guarantee) and then define Sp{F) = F{[Sp{F) (so that [S'p(P)]x> C 
[Sp{F)]p). The operator 'f|' is defined in terms of constrained facts. For example, 
if 

P = {p{X, Y) ^ X >0,Y >0,X <Y} 

F' = {p{x, Y) ^ X >0,Y >0,X <Y + 1} then 
F}\F' = {p{X, Y) ^ X>Q,Y >{)}. 

Formally, F{\F' contains each constrained fact that is obtained from some con- 
strained fact p{x) <— Cl A . . . A c„ in P' by removing all conjuncts Ci that are 
strictly entailed by some conjunct dj of some ‘compatible’ constrained atom 
p{x) ^ di A . . . A dm in P, where ‘compatible’ means that the conjunction 
Cl A ... A c„ A c?i A ... A dm is satisfiable. This condition restricts the applica- 
tions of the widening operator e.g to facts with the same values for the control 
locations. 

In contrast with the ‘standard’ widening operators in [CH78] and the refined 
versions in [HPR97,BGP98], the operator j) can be directly implemented using 
the entailment test between constraints; furthermore, it is applied fact-by-fact, 
i.e., without requiring a preliminary computation of the convex hull of union of 
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polyhedra. Besides being computationally expensive, the convex hull approxi- 
mation may be an important factor wrt. loss of precision. Let us consider e.g. 
the two sets of constrained atoms 

F = {p{l,X)^X>2} 

F' = X)^ X>2, p{e, X)^ X < 0}. 

When applied to F and F' , each of the widening operators 
in [BGP98,CH78,HPR97] returns the (polyhedra denoted by the) fact 
p{£,X) <— true. In contrast, our widening is precise here, i.e., it returns F'. 
Note that the use of constrained facts automatically induces a partitioning over 
the state space wrt. the set of control locations; such a partitioning has shown 
to be useful to increase the precision of the widening operator (essentially, by 
reducing its applicability; see e.g. [HPR97,BGP98]). 



5 Experimentation in CLP 

We have implemented the model checking procedure described above in SIGStus 
Prolog 3.7.1 using the GLP(Q,R) library [Hol95] and the Boolean constraint 
solvers (which are implemented with BDDs) . We made extensive use of the run- 
time database facilities for storing and retrieving constrained facts, and of the 
meta-programming facilities (e.g., the interchangeability between uninterpreted 
and interpreted constraints expressions). 

We have applied the implementation to several infinite-state verifi- 
cation problems that are becoming benchmarks in the community (see 
e.g. [BGP97,BGP98,SKR98,SUM96,LS97]). This allowed us to evaluate the per- 
formance of our implementation, to experiment with evaluation strategies and 
abstractions through widenings, and to compare our solution with previous so- 
lutions. 

We implement the solving of constraints over integers, which is needed for 
model checking integer-valued concurrent systems, through a constraint solver 
over reals. We thus trade the theoretical and practical gain in efficiency with an 
extra abstraction. This abstraction yields yields a conservative approximation 
of GTL properties (by standard fixpoint theory). In our experiments, we did 
not incur a loss of precision. It would be interesting to generally characterize the 
integer- valued concurrent systems for which the abstraction of integer constraints 
to the reals is always precise. 

We will now briefly comment on the experimental results listed in Fig. 4. All the 
verification problems have been tested on a Sun Sparc Station 4, OS 5.5.1. 

Mutual exclusion and starvation freedom for the bakery algorithm (see Sect. 
2 and Sect. 3) can be verified without the use of widening (execution time for 
starvation freedom: 0.9s). In versions of the bakery algorithm for 3 and 4 pro- 
cesses (not treated in [BGP97]), a maximum operator (used in assignments of 
priorities such as Turni = max{Turn 2 , Turns) + 1) is encoded case- by-case in 
the constraint representation. This makes the program size grow exponentially 
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Fig. 4. Benchmarks for the verification of safety properties; C: number of clauses, 
E: exact, A: approximation with widening, R: elimination of redundant facts, 
T: execution time (in seconds), N: number of produced facts, — : not needed, 
"f : non-termination. 



in the number of processes. Although here the time cost seems still reasonable, 
more experiments are needed to truly check scalability. 

The ticket algorithm (see [BGP97]) is based on similar ideas as the bakery 
algorithm. Here, priorities are maintained through two global variables and two 
local variables. As in [BGP97], we needed to apply widening to prove safety. In a 
second experiment we applied the magic set transformation instead and obtained 
a proof in 0.6s. We proved starvation freedom in 3.0s applying widening for the 
outer least fixpoint (the inner one for the greatest fixpoint terminates without 
abstraction) . 

The algorithm mut-ast (see [LS97]) is also designed to ensure mutual exclu- 
sion. We have translated the description of a network of an arbitrary, non- fixed 
number of mut-ast-processes in [LS97] into a GLP-program and proved safety 
using abstraction {network). 

The other examples are producer-consumer algorithms. The algorithm bhuffer 
(see [BGP98]) coordinates a system of two producers and two consumers con- 
nected by a buffer of bounded size. We proved two invariants: the difference 
between produced and consumed items is always equal to the number of items 
currently present in the buffer {bbuffer{l)) , and the number of free slots always 
ranges between zero and the maximum size of the buffer {bbuffer{2)) . The al- 
gorithm ubuffer (see [BGP98]) coordinates a system with one producer and one 
consumer connected by two unbounded buffers. We have proved the invariant 
that the number of consumed items is always less than that of produced ones. 

A prototypical version of our model checker (SIGStus source code, to- 
gether with the code of the verification problems considered in this section and 
the outcomes of the fixpoint computations) is available at the URL address 
WWW . mpi-sb . mpg . de/~delzanno/ clp . html. 
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6 Related Work 

There have been other attempts to connect logic programming and verification, 
none of which has our generality with respect to the applicable concurrent sys- 
tems and temporal properties. In [FR96], Fribourg and Richardson use CLP 
programs over gap-order integer constraints [Rev93] in order to compute the set 
of reachable states for a ‘decidable’ class of infinite-state systems. Constraints of 
the form x = y-\-l (as needed in our examples) are not gap-order constraints. In 
[F097], Fribourg and Olsen study reachability for system with integer counters. 
These approaches are restricted to safety properties. 

In [Rau94], Rauzy describes a CLP-style extension of the propositional /r- 
calculus to finite-domain constraints, which can be used for model checking for 
finite-state systems. In [Urb96], Urbina singles out a class of CLP{TZ) programs 
that he baptizes ‘hybrid systems’ without, however, investigating their formal 
connection with hybrid system specifications; note that liveness properties of 
timed or hybrid automata can not be directly expressed through fixpoints of 
the Sp operator (because the clauses translating time transitions may loop). 
In [GP97], Gupta and Pontelli describe runs of timed automata using the top- 
down operational semantics of CLP-programs (and not the fixpoint semantics). 
In [CP98] , Charatonik and Podelski show that set-based program analysis can be 
used as an always terminating algorithm for the approximation of CTL proper- 
ties for (traditional) logic programs specifying extensions of pushdown processes. 
In [RRR+97], a logic programming language based on tabling called XSB is used 
to implement an efficient local model checker for finite-state systems specified in 
a CCS-like value-passing language. The integration of tabling with constraints 
is possible in principle and has a promising potential. 

As described in [LLPY97], constraints as symbolic representations of states 
are used in uppaal, a verification tool for timed systems [LPY97]. It seems that, 
for reasons of syntax, it is not possible to verify safety for our examples in the 
current version of uppaal (but possibly in an extension). Note that uppaal can 
check hounded liveness properties only, which excludes e.g. starvation freedom. 

We will next discuss work on other verification procedures for integer- valued 
systems. In [BGP97,BGP98], Bultan, Gerber and Pugh use the Omega library 
for Presburger arithmetic as their implementation platform. Their work directly 
stimulated ours; we took over their examples of verification problems. The exe- 
cution times (ours are about an order of magnitude shorter than theirs) should 
probably not be compared since we manipulate formulas over reals instead of 
integers; we thus add an extra abstraction for which in general a loss of preci- 
sion is possible. In [BGL98], their method is extended to a composite approach 
(using BDDs), whose adaptation to the GLP setting may be an interesting task. 
In [GABN97], Ghan, Anderson, Beame and Notkin incorporate an efficient rep- 
resentation of arithmetic constraints (linear and non-linear) into the BDDs of 
SMV [McM93]. This method uses an external constraint solver to prune states 
with unsatisfiable constraints. The combination of Boolean and arithmetic con- 
straints for handling the interplay of control and data variables is a promising 
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idea that fits ideally with the CLP paradigm and systems (where BBD-based 
Boolean constraint solvers are available). 

7 Conclusion and Future Work 

We have explored a connection between the two fields of verification and pro- 
gramming languages, more specifically between model checking and CLP. We 
have given a reformulation of safety and liveness properties in terms of the well- 
studied CLP semantics, based on a novel translation of concurrent systems to 
CLP programs. We could define a model checking procedure in a setting where 
a fixpoint of an operator on infinite sets of states and a fixpoint of the corre- 
sponding operator on their implicit representations can be formally related via 
well-established results on program semantics. 

We have turned the theoretical insights into a practical tool. Our implemen- 
tation in a CLP system is direct and natural. One reason for this is that the 
two key operations used during the fixpoint iteration are testing entailment and 
conjoining constraints together with a satisfiability test. These operations are 
central to the CLP paradigm [JM94]; roughly, they take over the role of read 
and write operations for constraints as first-class data-structures. 

We have obtained experimental results for several example infinite-state sys- 
tems over integers. Our tool, though prototypical, has shown a reasonable per- 
formance in these examples, which gives rise to the hope that it is useful also 
in further experiments. Its edge on other tools may be the fact that its CLP- 
based setting makes some optimizations for specific examples more direct and 
transparent, and hence experimentation more flexible. In a sense, it provides a 
programming environment for model checking. We note that CLP systems such 
as SICStus already provide high-level support for building and integrating new 
constraint solvers (on any domain). 

As for future work, we believe that more experience with practical examples is 
needed in order to estimate the effect of different fixpoint evaluation strategies 
and different forms of constraint weakening for conservative approximations. 
We believe that after such experimentation it may be useful to look into more 
specialized implementations. 
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Abstract. McMillan has presented a deadlock detection method for 
Petri nets based on finite complete prefixes (i.e. net unfoldings). The 
basic idea is to transform the PSPACE-complete deadlock detection 
problem for a 1-safe Petri net into a potentially exponentially larger 
NP-complete problem of deadlock detection for a finite complete prefix. 
McMillan suggested a branch-and-bound algorithm for deadlock detec- 
tion in prefixes. Recently, Melzer and Rmer have presented another ap- 
proach, which is based on solving mixed integer programming problems. 
In this work it is shown that instead of using mixed integer program- 
ming, a constraint-based logic programming framework can be employed, 
and a linear-size translation from deadlock detection in prefixes into the 
problem of finding a stable model of a logic program is presented. As 
a side result also such a translation for solving the reachability prob- 
lem is devised. Experimental results are given from an implementation 
combining the prefix generator of the PEP-tool, the translation, and an 
implementation of a constraint-based logic programming framework, the 
smodels system. The experiments show the proposed approach to be 
quite competitive, when compared to the approaches of McMillan and 
Melzer/Rmer. 



1 Introduction 

Petri nets are a widely used model for analyzing concurrent and distributed sys- 
tems. Often such a system must exhibit reactive, non-terminating behavior, and 
one of the key analysis problems is that of deadlock-freedom: Do all reachable 
global states of the system (markings of the net) enable some action (net transi- 
tion)? In this work we study this problem for a subclass of Petri nets, the 1-safe 
Petri nets, which are capable of modelling finite state systems. For 1-safe Petri 
nets the deadlock detection problem is PSPACE-complete in the size of the net 
[4], however, restricted subclasses of 1-safe Petri nets exist for which this problem 
is NP-complete [10, 11]. McMillan has presented a deadlock detection method 
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for Petri nets based on finite complete prefixes (i.e. net unfoldings) [10, 11]. The 
basic idea is to transform the PSPACE-complete deadlock detection problem 
for a 1-safe Petri net into a potentially exponentially larger NP-complete prob- 
lem. This translation creates a finite complete prefix, which is an acyclic 1-safe 
Petri net of a restricted form. Experimental results show that the blowup of the 
transformation can in many cases be avoided [5, 10, 11, 12]. 

In this work we address the NP-complete deadlock detection problem for 
finite complete prefixes. McMillan originally suggested a branch-and-bound al- 
gorithm for solving this problem. Recently, Melzer and Rmer have presented an- 
other algorithm which is based on solving mixed integer programming problems 
generated from prefixes [12]. Their approach seems to be faster than McMil- 
lan’s on examples in which a large percentage of the events of the prefix are so 
called cut-off events. However, if this assumption does not hold, the run times 
are generally slower than those of the McMillan’s algorithm [12]. 

In this work we study an approach that is similar to that of Melzer and 
Rmer in the way of being capable of handling cases with a large percentage of 
cut-off events but with more competitive performance. Instead of mixed integer 
programming our approach is based on a constraint-based logic programming 
framework [13, 14, 15]. We translate the deadlock detection problem into the 
problem of finding a stable model of a logic program. As a side result we also 
obtain such a translation for checking the reachability problem, which is also 
NP-complete in the size of the prefix [4] . For the deadlock detection problem we 
present experimental results, and find our approach competitive with the two 
previous approaches. 

The rest of the paper is divided as follows. First we present Petri net nota- 
tions used in the paper. In Sect. 3 we will introduce the rule-based constraint pro- 
gramming framework. Section 4 contains the main results of this work, linear-size 
translations from deadlock and reachability property checking into the problem 
of finding a stable model of a logic program. In Sect. 5 we present experimental 
results from our implementation. In Sect. 6 we conclude and discuss directions 
for future research. 



2 Petri Net Definitions 

First we define basic Petri net notations. Next we introduce occurrence nets, 
which are Petri nets of a restricted form. Then branching processes are given as 
a way of describing partial order semantics for Petri nets. Last but not least we 
define finite complete prefixes as a way of giving a finite representation of this 
partial order behavior. We follow mainly the notation of [5, 12]. 



2.1 Petri Nets 

A triple {S,T,F) is a net if S' n T = 0 and F C (S x T) U (T x S). The 
elements of S are called places, and the elements of T transitions. Places and 
transitions are also called nodes. We identify F with its characteristic function 
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on the set {S x T) U {T x S). The preset of a node x, denoted by *x, is the set 
{y € S' U r| F{y,x) = 1}. The postset of a node x, denoted by x* , is the set 
{y G S U T I F{x, y) = 1}. Their generalizations on sets of nodes X C SUT are 
defined as *X = Uajejc X* = [J^^x respectively. 

A marking of a net (S, T, F) is a mapping S i— > IN. A marking M is identified 
with the multi-set which contains M{s) copies of s for every s G S. A 4-tuple X = 
(S, T, F, Mq) is a net system if (S, T, F) is a net and Mq is a marking of (S, T, F). 
A marking M enables a transition t if Vs G S : F{s, t) < M{s). If t is enabled, it 
can occur leading to a new marking (denoted M M'), where M' is defined by 
\/s G S : M'{s) = M{s) — F{s, t) + F{t, s). A marking M is a deadlock marking 
iff no transition t is enabled by M. A marking Mn is reachable in X iff there 
exist a sequence of transitions ti,t 2 ,--- ,tn and markings Mi, M 2 ,... ,M„_i 
such that: Mg Mi ^ ... M„_i Mn- A reachable marking is 1-safe if 
Vs G S' : M(s) < 1. A net system X is 1-safe if all its reachable markings are 
1-safe. In this work we will restrict ourselves to the set of net systems which are 
1-safe, have a finite number of places and transitions, and also in which each 
transition t G T has both nonempty pre- and postsets. 



2.2 Occurrence Nets 

We use <F to denote the reflexive transitive closure of F. Let {S, T, F) be a net 
and let X\,X2 G S U T. The nodes xi and X2 are in conflict, denoted by x\ifx2, 
if there exist t\,t2 G T such that ti yf t2, *t\ n *t2 yf 0 , t\ <p x\, and t2 <f X2- 
An occurrence net is a net N = {B, E, F) such that: 

-ybGB: \*b\ < 1, 

— F is acyclic, i.e. the irrefiexive transitive closure of A is a partial order, 

— N is finitely preceded, i.e. for any node x of the net, the set of nodes y such 
that y <F X is finite, and 

— Va; G S' U T : ~^{xfl=x). 

The elements of B and E are called conditions and events, respectively. The set 
Min{N) denotes the set of minimal elements of the transitive closure of F. A 
configuration C of an occurrence net is a set of events satisfying: 

— If e G C then Ve' G E : e' <f e implies e' G C {C is causally closed), 

— Ve,e' G C : ^(e#e') (C is conflict-free). 



2.3 Branching Processes 

Branching processes are “unfoldings” of net systems and were introduced by 
Engelfriet [3]. Let Ni = (S'i,Ti,Fi) and N 2 = (<S' 2 , T 2 , F 2 ) be two nets. A homo- 
morphism is a mapping S'! U Ti i-^- S '2 U T 2 such that: h{Si) C S 2 /\ h(Ti) C T 2 , 
and for all t G Ti, the restriction of h to *t is a bijection between *t and *h{t), 
and similarly for t* and h(t)* . A branching process of a net system A is a tu- 
ple [3 = {N' , h), where N' is a occurrence net, and h is & homomorphism from 
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N' to {S,T,F) such that: the restriction of h to Min{N') is a bijection be- 
tween Min(N') and Mq, and Vei,e2 G E, if *ei = *62 A /i(ei) = h{e 2 ) then 
Cl = 62- The set of places associated with a configuration C of /3 is denoted by 
Mark{C) = h{{Min{N) U C*) \ *C). A configuration C is a deadlock configura- 
tion iff the set (Min{N) U C") \ *C does not enable any event e G E. 

2.4 Finite Complete Prefixes 

A finite branching process /3 is a finite complete prefix of a net system E iff for 
each reachable marking M of A there exists a configuration C oi (3 such that: 

— Mark{C) = M, and 

— for every transition t enabled in M there exists a configuration CU {e} such 
that e ^ C and h{e) = t. 

Algorithms to obtain a finite complete prefix (3 given a 1-safe net system A are 
presented in e.g. [5, 10, 11]. The algorithms will mark some events of the prefix 
(3 as special cut-off events, which we denote by the set CutOffs{(3) C E. The 
intuition behind cutoff events is that for each cut-off event e there already exists 
another event e' in the prefix. The markings reachable after executing e can 
also be reached after executing e! , and thus the markings after e need not to be 
considered any further. Due to space limitations we direct the reader interested 
in the approach to [5, 10, 11, 12]. 

3 Rule-Based Constraint Programming 

We will use normal logic programs with stable model semantics [6] as the un- 
derlying formalism into which the deadlock and reachability problems for 1-safe 
Petri nets are translated. This section is to a large extent based on [15]. 

The stable model semantics is one of the main declarative semantics for nor- 
mal logic programs. However, here we use logic programming in a way that is 
different from the typical PROLOG style paradigm, which is based on the idea 
of evaluating a given query. Instead, we employ logic programs as a constraint 
programming framework [13], where stable models are the solutions of the pro- 
gram rules seen as constraints. We consider normal logic programs that consist 
of rules of the form 



h ^ ai, . . . , an, not (bi), ... ,not (b^) (1) 

where ai, . . . , an,bi, . . . ,bm and h are propositional atoms. Such a rule can be 
seen as a constraint saying that if atoms ai , . . . , an are in a model and atoms 
bi, . . . , bm are not in a model, then the atom h is in a model. The stable model se- 
mantics also enforces minimality and groundedness of models. This makes many 
combinatorial problems easily and succinctly describable using logic program- 
ming with stable model semantics. 

We will demonstrate the basic behavior of the semantics using programs 
P1-P4: 
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PI: a <— not (b) P2: a <— a P3: a <— not (a) P4: a <— not (b), c 

b ^ not (a) b ^ not (a) 

Program PI has two stable models: {a} and {b}. The property of this program 
is that we may freely make negative assumptions as long as we do not bump 
into any contradictions. For example, we may assume not (b) in order to deduce 
the stable model {a}. Program P2 has the empty set as its unique stable model. 
This exposes the fact that we can’t use positive assumptions to deduce what 
is to be included in a model. Program P3 is an example of a program which 
has no stable models. If we assume not (a.), then we will deduce a, which will 
contradict with our assumption not (a). Program P4 has one stable model {b}. 
If we assume not (a) then we will deduce b. If we assume not (b) then we can’t 
deduce a, because c can’t be deduced from our assumptions. 

The stable model semantics for a normal logic program P is defined as fol- 
lows [6]. The reduct of P with respect to the set of atoms A is obtained 
(i) by deleting each rule in P that has a not-atom not (x) in its body such that 
yi £ A and (ii) by deleting all not-atoms in the remaining rules. A set of atoms 
A is a stable model of P if and only if A is the deductive closure of P^ when 
the rules in P^ are seen as inference rules. 

A non-deterministic way of constructing stable models is to guess which 
assumptions (not-atoms of the program) to use, and then check using the de- 
ductive closure (in linear time) whether the resulting model agrees with the 
assumptions. The problem of determining the existence of a stable model is in 
fact NP-complete [9]. 

3.1 The tool smodels 

There is a tool, the smodels system [14, 15], which provides an implementation 
of logic programs as a rule-based constraint programming framework. It finds 
(some or all) stable models of a logic program. It can also tell when the program 
has no stable models. It contains strong pruning techniques to make the problem 
tractable for a large class of programs. The smodels implementation needs space 
linear in the size of the input program [15]. 

The stable model semantics is defined using rules of the form (1). The 
smodels 2 handles extended rule types, which can be seen as succinct en- 
codings of sets of basic rules. One of the rule types is a rule of the form: 
h ^ 2{ai,... ,an}. The semantics of this rule is that if two or more atoms 
from the set ai , . . . , an belong to the model, then also the atom h will be in the 
model. It is easy to see that this rule can be encoded by using ^ basic rules 
of the form: h <— ai,aj. Using an extended rule instead of the corresponding 
basic rule encoding was necessary to achieve a linear-size translation of the two 
problems at hand. 

We also use the so called integrity rules in the programs. They are rules with 
no head, i.e. of the form: ^ ai, . . . , an, not (bi), . . . , not (bj,). The semantics is 
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the following: A new atom f is introduced to the program, and the integrity rule 
is replaced by: f <— ai,... , an, not (bi), .. . , not (bj,), not (f ). It is easy to see 
that any set of atoms, such that ai, . . . , an are in a model and atoms bi, . . . , bj, 
are not in a model, is not a stable model. It is also easy to see that the rule 
doesn’t add any new stable models. The last extended rule we use is of the form: 
{h} ^ ai, . . . , an. The semantics is the following: A new atom h' is introduced 
to the program, and the rule is replaced by two rules: h ^ a.i, . . . , an, not (h'), 
and h' <— not (h) . The atom h' is removed from any stable models it appears in, 
and the rest of the model gives the semantics for the extended rule. 

4 Translating Deadlock and Reachability Property 
Checking into Logic Programs 

In this section we present the translations of deadlock and reachability properties 
into logic programs with stable model semantics. For the deadlock property the 
main result can be seen as a rephrasing of the Theorem 4 of [12], where mixed in- 
teger programming has been replaced by the rule-based constraint programming 
framework. For the reachability property we give another translation. 

In this work we assume that the set of events of a finite complete prefix is 
non-empty. If it is empty, the corresponding net system would have no events 
enabled in the initial state, and then the deadlock and reachability properties 
can be trivially solved by looking at the initial state only. 

Now we are ready to define our translation from the finite complete pre- 
fixes into logic programs with stable model semantics. The basic part of our 
translation is given next. It translates the notion of a configuration of a finite 
complete prefix into the problem of finding a stable model of a logic program. 
The definitions will be followed by an example translation given in Fig. 1. 

First we define some additional notation. We assume a unique numbering of 
the events (and conditions) of the finite complete prefix. We use the notation Cj 
{bi) to refer to the event (condition) number i. In the logic programs ei, (bi) is 
an atom of the logic program corresponding to the event (condition bi). 

Definition 1. Let (3 = {N,h) with N = (B,E,F) be a finite complete prefix 
of a given 1-safe net system E. Let Pb{( 3) be a logic program containing the 
following rules: 

1. For all 6i G E \ CutOffs{(3) a rule: 

Si ^ 6p,, . . . , Gp^, not (bGi), 
such that {cpj, . . . , Cp„} = *(*6i). 

2. For all Ci G E \ CutOffs{!3) a rule: 
bGi <— not (ei). 

3. For all bi € B such that \bi* \ CutOffs{f3)\ >2 a rule: 

< J • ■ • ) ®Pn}> 

such that {epi,--. ,epj = bi*\ CutOffs{(3). 
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In the logic program definitions of this paper we use the convention that a 
part of a rule will be omitted, if the corresponding set evaluates to the empty 
set. For example rule 1 for an event Cj, such that *(*Ci) = 0, would become: ei 
<— not (hei). The translation above could be trivially extended to also include 
the cut-off events, but they are not needed by the applications in this work. 

We define a mapping from a set of events of the prefix to a set of atoms of a 
logic program and vice versa. 

Definition 2. The set of atoms of a logic program P corresponding to a set of 
events C C E \ Cutoffs{(3) of a finite complete prefix f3 is Model{C) = {ei | € 

C} U {bej \ej&E\{C\J Cutoff s {(3)}} . 

Definition 3. The set of events corresponding to a stable model A of a logic 
program P is Events (A) = {cj € E\ei G A}. 

Now we are ready to state the correspondence between the finite complete 
prefix and the core part of our translation. Proofs of the theorems are omitted. 

Theorem 1. Let (3 he a finite complete prefix of a Tsafe net system E, let Pb{! 3) 
he the logic program translation by Def. 1, and let C he a configuration of (3, such 
that C n Cutoffs {(3) = 0. Then the set of atoms A = Model {C) is a stable model 
of Pb{( 3). Additionally, the mapping Events{A) is a hijective mapping from the 
stable models of Pb{! 3) to the configurations of (3 which contain no cut-off events. 

Next we move to the deadlock translation. We add a set of rules to the 
program which place additional constraints on the stable models of the program 
Pb{(3). We add integrity rules to the program, which remove all stable models of 
the basic program which are not deadlocks. To do this we model the the enabling 
of each event (cut-off or not) of the prefix in the logic program. 

Definition 4. Let f3 he a finite complete prefix of a given Tsafe net system E. 
Let Pd{/ 3) be a logic program containing all the rules of the program Pb{( 3) of 
Def. 1, and also the following rules: 

1. For all bi G {bj G B \ bj* yf 0} a rule: 

bi ^ ei, not(ep,), ... , not(ep^), 

such that {ei} = *bi, and . . . , Cp„} = bi* \ CutOffs(P). 

2. For all Ci G E a rule: 

< bp, , ... , bp„, 

such that . . . , 6p„} = *6^. 

Theorem 2. Let f3 he a finite complete prefix of a Tsafe net system E, and let 
Pd{! 3) he the logic program translation by Def. 4- There exists a stable model of 
Pd{! 3) iff E has a reachable deadlock marking M. Additionally, for any stable 
model A of Pd{/ 3), the set of events C = Events{A) is a deadlock configuration 
of P, such that Mark{C) is a reachable deadlock marking of E. 
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o o 



Pd(N2) : 

ei <— not (bei) 
bei ^ not (ei) 

62 ^ not (b 62 ) 
b62 ^ not (02) 

63 ^ not (bea) 
bes ^ not (ea) 

65 ^ 6i, not (bes) 
bes ^ not (es) 
ea ^ es, not (bea) 
bes ^ not (ea) 

^ 2{ei, 62, ea} 
bi ^ not (ei) 

b2 <— not (ei), not (02), not (ea) 
ba ^ ei 

b4 ^ ei, not (es) 
bs ^ 02 

by «— es, not (ea) 

<— bi,b2 

b2 

<— ba 
^ b4 
^ bs 
^ by 



Fig. 1. Deadlock translation example. 



In Fig. 1 an example of the deadlock translation is given. The prefix N2 
is a finite complete prefix of the 1-safe nets system Nl. The cut-off events of 
N2 are marked with crosses. The translated program Pu{N2) has only one 
stable model A — {bei,be 2 , e 3 ,be 5 ,be 8 ,bi|, and the set Events{A) = {es} is a 
deadlock configuration of N2. 

Next we will preset a way of translating reachability problems. First we need 
a way of making statements about an individual marking M . 

Definition 5. An assertion on a marking of a 1-safe net system S = (S', T, F, Mg) 
is a tuple (S’*", S~), where S’*", S~ C S, and S+nS” = 0. The assertion (S+, S~) 
agrees with a marking M of E iff: 

S+ C (s e S I M(s) = 1} A S" C (s G S I M(s) = 0}. 

With assertions we can easily formulate both the reachability and submark- 
ing reachability problems. The idea is again to add some integrity rules to the 
program which remove all stable models of Pb(/ 3) which do not agree with the 
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assertion. The basic structure is the same as for deadlocks, however we also need 
a set of atoms which represent the marking of the original net. 

Definition 6. Let (3 be a finite complete prefix of a given 1-safe net system S 
= {S,T,F,Mq), and let (f = he an assertion on the places of S. Let 

Pr{! 3, 4>) he a logic program containing all the rules of the program Pb{( 3) of Def. 
1, and also the following rules: 

1. For all hi € {bj G B \ h(bj) G S~^ U S~ A *bj G E \ Cutoffs(P)} a rule: 

bi ^ ei, not(e-p^), , not{epJ, 

such that {ei} = *bi, and . . . , Cp„} = hi* \ CutOffs{f3). 

2. For all hi G {bj G B \ h(bj) G S~^ U S~ A *bj G E \ Cutoffs(P)} a rule: 

Si bi, 

such that Si = h{bi). 

3. For all Si G S'^ a rule: 

^ not (si). 

4- For all Si G S~ a rule: 

^ Si. 

Note that in the definition above only conditions of the prefix (3 and places of 
S which can affect the assertion <f> are translated. Also cut-off postset conditions 
are not translated, because cut-offs will not be fired by the translation. 

Theorem 3. Let (3 he a finite complete prefix of a 1-safe net system B, and let 
Pr{!3, 4>) be a logic program translation by Def. 6. The logic program Pr{!3, 4>) has 
a stable model iff there exists a reachable marking of B which agrees with <f>. Ad- 
ditionally, for any stable model A of Pr{!3,4>), the configuration C = Events(A) 
is a configuration of [3, such that Mark{C) is a reachable marking of B which 
agrees with <j). 

It is easy to see that the sizes of all the translations are linear in the size of the 
prefix /3, i.e. 0{\B\ -\- \E\ -\- |F|). Because the rule-based constraint programming 
system we use needs linear space in the size of the input program, deadlock and 
reachability property checking exploiting these translations can be made using 
linear space in the size of the prefix. The translations are also local, which makes 
them straightforward to implement using linear time in the size of the prefix. 



5 Deadlock Property Checking Implementation 

We have implemented the deadlock property checking translation using C-| — h, 
and we plan on implementing the reachability translation in the near future. 
The translation reads a binary file containing the description of a finite com- 
plete prefix generated by the PEP-tool [7]. It generates a logic program using 
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the deadlock translation, which is then through an internal interface given to 
the smodels stable model generator. The translation performs the following op- 
timizations: 

1. Not generating the program iff the number of cut-off events is zero. 

2. Removal of blocking of “stubborn” transitions: If we find an event Cj such 
that (*ei)* \ Cutoffs(P) = {ei}, the corresponding rule of type 1 of the 
program Pb{P) is replaced by a rule of the form: ei ^ ep^, . . . , ep_^, and the 
rule 2 of the form: bei <— not (ei) is not created. Also the corresponding 
liveness rule of type 2 of the program Pd{P) of the form: ^ bpi, • ■ • ,bp^ 
does not need to be created as far as the event Cj is concerned. 

3. Removal of redundant condition rules: The rule of type 1 of the program 
Pd{P) corresponding to condition bi is removed if the atom bi is does not 
appear elsewhere in the program. 

4. Removal of redundant atoms: If a rule of the form: ai 4— a 2 would be gen- 
erated, and this is the only rule in which ai appears as a head, then all 
instances of ai are replaced by a 2 , and the rule is discarded. 

5. Duplicate rule removal: Only one copy of each rule is generated. 

For the optimization 1 it is easy to see that the net system S will deadlock, 
because the finite complete prefix is finite and does not contain any cut-offs. 
Thus the net system S can fire only a finite number of transitions. It also is 
straightforward to prove that the optimizations 3-5 do not alter the number of 
stable models the program has. The optimization 2 is motivated by stubborn 
sets [16]. The intuition is that whenever is enabled, it must be disabled in 
order to reach a deadlock. However the only way of disabling Cj is to fire it. 
Therefore we can discard all configurations in which Cj is enabled as not being 
deadlock configurations. 

We argue that optimization 2 is correct, i.e. the stable models of the program 
Pd{P) are not affected by it (modulo the possible removal of the atom bei from 
the set of atoms of the optimized program). Consider the original program, and 
an optimized one in which an event Cj has been optimized using optimization 2. 
If we look only at the two programs without the deadlock detection parts added 
by Def. 4, their only difference is that in the original program it is possible to 
leave the event enabled but not fired, while this is not possible in the optimized 
program. Thus clearly the set of stable models of the optimized program is a 
subset of the stable models of the original one. If we have any configuration in 
which the event is enabled but is not fired, then the set of atoms corresponding 
to this configuration is not a stable model of the original program. This is the 
case because the integrity rule of type 2 of Def. 4 corresponding to the event Cj 
eliminates such a potential stable model. Therefore the optimized program will 
have the same number of stable models as the original one. 

We do quite an extensive set of optimizations. The optimizations 1 and 2 are 
deadlock detection specific. The optimizations 3-5 can be seen as general logic 
program optimizations based on static analysis, and could in principle be done 
in the stable model generator after the translation. The optimizations 1-4 are 
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implemented using linear time and space in the size of the prefix. The duplicate 
rule removal is implemented with hashing. 

We use succinct rule encodings with extended rules when possible. The two 
rules Gi ^ Gpj,... ,Gp^, not (bGi), and bGi ^ not (gi) can be more succinctly 
encoded by an extended rule of the form: {gi} ^ Gp^, . . . , Gp^. Also ^ 2{ai, a 2 } 
is replaced by: ^ &i,a 2 . We also sort the rules after the translation. In our 
experiments the sorting seems to have only a minimal effect on the total running 
time, but produces nicer looking logic program (debugging) output. 

After the translation has been created, the smodels computational engine is 
used to check whether a stable model of the program exists. If one exists, the 
deadlock checker outputs an example deadlock configuration using the found 
stable model. Otherwise the program tells that the net is deadlock free. 



5.1 Experimental Results 

We have made experiments with our approach using examples by Corbett [2], 
McMillan [10, 11], and Melzer and Rmer [12]. They were previously used by 
Melzer and Rmer in [12] and by Best and Rmer in [1], where additional infor- 
mation can be found. We compare our approach with two other finite complete 
prefix based deadlock checking methods. The first method is the branch-and- 
bound deadlock detection algorithm by McMillan [10, 11, 12], and the other is 
the mixed integer programming approach by Melzer and Rmer [12]. 

The Figures 2-4 present the running times in seconds for the various al- 
gorithms used in this work, and for the mixed integer programming approach 
those presented in [12]. The running times have been measured using a Pentium 
166MHz, 64MB RAM, 128MB swap, Linux 2.0.29, g-l— I- 2. 7. 2.1, smodels pre- 
2.0.30, McMillan’s algorithm version 2.1.0 by Stefan Rmer, and PEP 1.6g. The 
experiments with the mixed integer programming approach by Melzer and Rmer 
used a commercial MIP-solver CPLEX, and were conducted on a Sparcstation 
20/712, 96MB RAM. 

The rows of the tables correspond to different problems. The columns repre- 
sent: sum of user and system times measured by /usr/bin/timG command, or 
times reported in [12], depending on the column: 

— Unf = time for unfolding (creation of the finite complete prefix) (PEP) . 

~ DCmip = time for Mixed integer programming approach in [12]. 

— DCmcM = time for McMillan’s algorithm, average of 4 runs. 

— DCsmo = time for smodels based deadlock checker, average of 4 runs. 

The marking vm(n) notes that the program ran out of virtual memory after n 
seconds. The other fields of the figures are as follows: \B\: number of conditions, 
\E\: number of events, ^c: number of cut-off events, DL: Y - the net system 
has a deadlock, CP: choice points i.e. the number of nondeterministic guesses 
smodels did during the run. The DCsmo column also includes the logic program 
translation time, which was always under 10 seconds for the examples. 
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Problem (size) 


\B\ 


\E\ 


#c 


DL 


GP 


UnP 


DG^ip 


DCmcM 


DG/mo 


DPD(5) 


1582 


790 


211 


N 


0 


0.6 


17.3 


1.6 


1.0 


DPD(6) 


3786 


1892 


499 


N' 


0 


3.2 


82.8 


12.3 


6.1 


DPD(7) 


8630 


4314 


1129 


N 


0 


17.4 


652.6 


128.9 


31.4 


DPH(5) 


2712 


1351 


547 


N 


0 


1.3 


42.9 


6.5 


1.8 


DPH(6) 


14474 


7231 


3377 


N 


0 


33.7 


1472.8 


1063.7 


32.9 


DPH(7) 


81358 


40672 


21427 


N 


0 


929.3 


- 


vm(1690.2) 


760.6 


ELEVATOR(2) 


1562 


827 


331 


Y 


2 


0.6 


2.3 


0.5 


0.7 


ELEVATOR(3) 


7398 


3895 


1629 


Y 


3 


10.3 


14.5 


10.1 


15.0 


ELEVATOR(4) 


32354 


16935 


7337 


Y 


4 


186.1 


387.8 


268.8 


231.7 


FURNACE(l) 


535 


326 


189 


N 


0 


0.1 


0.3 


0.2 


0.0 


FURNACE(2) 


5139 


3111 


1990 


N 


0 


3.2 


18.1 


11.1 


0.6 


FURNACE(3) 


34505 


20770 


13837 


N 


0 


134.7 


1112.5 


t;m(392.5) 


7.1 


RING(5) 


339 


167 


37 


N 


0 


0.1 


1.3 


0.1 


0.1 


RING(7) 


813 


403 


79 


N 


0 


0.2 


17.1 


0.2 


0.4 


RING(9) 


1599| 


795 


137 


N' 


0 


0.7 


71.2 


0.7 


2.2 


RW(6) 


806 


397 


327 


N 


0 


0.1 


0.7 


0.3 


0.0 


RW(9) 


9272 


4627 


4106 


N' 


0 


2.0 


58.5 


68.2 


0.4 


RW(12) 


98378 


49177 


45069 


N 


0 


137.5 


24599.9 


vm(3050.5) 


4.2 



Fig. 2. Measured running times in seconds: 

^ = Pentium 166MHz, 64MB RAM, Linux 2.0.29. 
2 = Sparcstation 20/712, 96MB RAM [12]. 



The logic programming approach using the smodels system was able to pro- 
duce an answer for all the examples presented here, while the McMillan’s algo- 
rithm implementation ran out of virtual memory on some of the larger exam- 
ples. Our approach was sometimes much faster, see e.g. FURNACE(3), RW(12), 
SYNC(3), BDS(l), GASQ(4), and Q(l). The McMillan’s algorithm was faster 
than our approach on the following problem classes: RING, HART, SENT and 
SPD. These problems are quite easy for both methods, running times for the 
first three were a few seconds, and for the fourth still well under 30 seconds. On 
the DME and KEY examples our approach is scaling better as the problem sizes 
increase. McMillan’s algorithm is most competitive when the number of cut-off 
events is relatively small. 

We do not have access to the MIP-solver used in [12], and also our experi- 
ments in [8] seem to indicate that the computer we made our experiments on is 
faster than theirs. This makes it difficult to comment on the absolute running 
times between different machines. However our approach is scaling better on 
most examples, see e.g. RW, DME, and SYNG examples. 

An observation that should be made is that the number of choice points for 
smodels in these examples is very low, with a maximum of 9 choice points in the 
example SPD(l). This means that on this example set the search space pruning 
techniques were very effective in minimizing the number of nondeterministic 
choices that were needed to solve the examples. 
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\B\ 
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- 
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N 
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- 
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N 


0 
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1.8 


SYNC(3) 
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0 
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66.5 



Fig. 3. Measured running times in seconds: 

^ = Pentium 166MHz, 64MB RAM, Linux 2.0.29. 
2 = Sparcstation 20/712, 96MB RAM [12]. 



The example nets and C++ source code for our translation including smodels 
are available from: http://saturii.hut.fi/~kepa/experiments/tacas99/ 



6 Conclusions 

Our main contribution is a method to transform the deadlock and reachability 
problems for 1-safe Petri nets into the problem of finding a stable model of a logic 
program. We do this translation in two steps: (i) Existing methods and tools are 
used to generate a finite complete prefix of the 1-safe Petri net [5, 7, 10, 11]. 
(ii) The deadlock and reachability problems for the finite complete prefix are 
translated into the problem of finding a stable model of a logic program. This 
step uses the two new translations presented in this work, both of which are 
linear in the size of the prefix. 

We present experimental results to support the feasibility of this approach for 
the deadlock detection problem. We use an existing constraint-based logic pro- 
gramming framework, the smodels system, for solving the problem of finding a 
stable model of a logic program. Our experiments show that the approach seems 
to be quite robust and competitive on the examples available to us. More exper- 
iments are needed to evaluate the feasibility of the approach on the reachability 
problem. 

There are interesting topics for future research. It seems possible to extend 
the translations to allow for a larger class of Petri nets to be translated, while still 
keeping the problem NP-complete. McMillan’s algorithm can be seen to be more 
goal directed algorithm than our approach, and an alternative translation using 
the basic ideas of McMillan’s algorithm could be created. The smodels system 
is quite a general purpose constraint propagation based search engine. Creating 
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3138 
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Y 


9 


6.1 


8.4 


21.8 



Fig. 4. Measured running times in seconds: 

^ = Pentium 166MHz, 64MB RAM, Linux 2.0.29. 



specialized algorithms for the two problems at hand could further improve the 
competitiveness of our approach. The subject of applying our approach to some 
form of model checking is a very interesting area for future research. 
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Abstract. The yr-calculus is a development of CCS that has the abil- 
ity of communicating channel names. The asynchronous n-calculus is a 
variant of the 7r-calculus where message emission is non-blocking. 

Finite state verihcation is problematic in this context, since even very 
simple asynchronous 7r-processes give rise to infinite-state behaviors. This 
is due to phenomena that are typical of calculi with name passing and 
to phenomena that are peculiar of asynchronous calculi. 

We present a finite-state characterization of a family of finitary asynchro- 
nous TT-processes by exploiting History Dependent transition systems with 
Negative transitions (HDN), an extension of labelled transition systems 
particularly suited for dealing with concurrent calculi with name pass- 
ing. We also propose an algorithm based on HDN to verify asynchronous 
bisimulation for finitary rr-processes. 



1 Introduction 

A growing interest has been recently devoted to calculi and languages for dis- 
tributed systems, and in particular to the new phenomena they evidence. One 
of these phenomena is mohility: in large distributed systems, like the internet, 
there is mobility of hardware (when a computer is moved to a different node) 
and mobility of code and data (when applets are downloaded from the network 
and executed locally, or when remote programs are executed on local data). 

The TT-calculus [7,6] is a foundational calculus with mobility. In the 7r-cal- 
culus, processes can handle channel names as messages, thus modeling changes 
in their neighborhood. Furthermore, name passing is enough to simulate higher 
order and object oriented concurrent calculi, thus also mobility of code and of 
data can be expressed in the 7r-calculus. In the original papers on 7r-calculus 
[7,6], communications are synchronous, i.e., the emission and the reception of 
a message are assumed to happen in the same instant. More recently, an asyn- 
chronous version of the 7r-calculus has been defined [5,2]. Here it is assumed that 
messages take time to move from the sender to the receiver, and that the sender 
is not blocked until the message is received. 

* Research partially supported by CNR Integrated Project “Metodi e Strumenti per 
la Progettazione e la Verihca di Sistemi Eterogenei Connessi mediante Reti di Co- 
municazione” , and Esprit WG CONFER2. 



W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 255-270, 1999. 
@ Springer- Verlag Berlin Heidelberg 1999 
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While more expressive and more suitable to describe distributed systems, the 
calculi with name passing give rise to new problems, that cannot be solved by 
exploiting existing techniques for CCS-like process calculi. Here we focus on the 
problem of extending to (classes of) 7r-processes the techniques of finite state 
verification. 

Finite state verification is successful in the case of concurrent systems, since 
interesting problems can be expressed by means of finite state systems. This is 
the case for instance of protocols, where the control part is often independent 
from the data part and can be verified with finite-state techniques. 

In this paper we face the problem of finite state verification for the asyn- 
chronous TT-calculus. This is not a trivial problem, since naive approaches lead 
to infinite state systems also for very simple asynchronous 7r-processes. Differ- 
ent techniques have to be exploited to obtain finite state representations for 
interesting classes of processes. Now we are going to describe these techniques. 

As a first step, we give a new definition of bisimilarity for the asynchronous 
TT-calculus. In the classical asynchronous bisimulations proposed in [5,1], a lazy 
approach is used for output messages: since an arbitrary amount of time can be 
required for a message to be delivered, messages are never forced to be emit- 
ted from the system. In this way, however, infinite state systems are obtained 
practically for all recursive processes: in fact, if new messages are produced but 
never delivered, the size of the system can grow unboundedly. 

We propose a different definition of bisimulation, that we have called hot- 
potato bisimulation, where messages are emitted as soon as they are ready (a 
similar approach is proposed in [12] for asynchronous systems without mobility). 
In this way, the system cannot grow unboundedly due to messages that are ready 
to be emitted, but that are still undelivered. The classical, eager asynchronous 
bisimulation and the new hot-potato bisimulation coincide. 

Another cause of infiniteness is the generation of fresh names. This is a 
general phenomenon for the TT-calculus: processes have the ability of creating 
dynamically new channels with the environment, and fresh names have to be 
associated to the new channels. Standard transition systems are not very con- 
venient for dealing with allocation and deallocation of names: name creation 
is handled via the exposure of an internal name, that is subject to alpha con- 
version, and this results in an infinite branching; moreover, if names are never 
removed, new states are produced at every cycle which includes a name gen- 
eration. In [8] we propose an enhanced version of labelled transition systems, 
that we called History Dependent (HD) transition systems, and a corresponding 
HD-bisimulation; names appear explicitly in states and labels of HD, so that 
name creation and deallocation can be explicitly represented. 

While HD and HD-bisimilarity are adequate to describe the TT-calculus with 
synchronous communications, a more general model is needed for the asynchro- 
nous TT-calculus. In this paper we define History Dependent transition systems 
with Negative transitions (HDN) and HDN-bisimulation. We show that the asyn- 
chronous TT-calculus can be represented by means of HDN and that finite state 
HDN are obtained for an important family of TT-processes. In our opinion, HDN 
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are a rather general model for mobile calculi; for instance, in [10] they are ap- 
plied also to the early/late [7] and to the open [13,11] semantics of 7r-calculus. 
We also believe that they can be applied to other calculi with mobility, like the 
join calculus [4]. 

Finally, we define an iterative method to calculate HDN-bisimilarity for a cer- 
tain class of finite state HDN. This method resembles the partitioning approach 
for ordinary labelled transition systems [9], where a partition of the states is built 
and incrementally refined until all the states in the same block are equivalent. In 
general HDN-bisimilarity is not guaranteed to be transitive: thus it is not possi- 
ble to build a partition of equivalent states. Therefore, our partitioning approach 
applies only to a class of redundancy- consistent HDN. Fortunately enough, the 
HDN corresponding to asynchronous 7r-calculus is redundancy-consistent. Hence 
the partitioning method applies to verify equivalence of finitary asynchronous 
TT-processes. 

2 The asynchronous 7r-calculus 

Asynchronous processes are a subset of ordinary 7r-calculus processes. More pre- 
cisely, output prefixes ab.P are not allowed in the asynchronous context: in db.P, 
in fact, process P is blocked until message ab is emitted, while in the asynchro- 
nous context message emission is required to be non-blocking. Output prefixes 
are replaced by output particles db, that represent an output communication of 
name b on channel a that is ready to be delivered. For the same reasons, outputs 
cannot appear in a choice point, so sums are restricted to r and input prefixes. 

Let 97 be an infinite, countable set of names, ranged over by a, . . . ,z. Asyn- 
chronous processes are defined by the syntax: 

P,Q ::= a6 I G I P\P | {va) P | A{a\, . . . ,an) (processes) 

G,H ::= 0 | a(b).P | r.P | G+G (guards) 

where we assume that a definition A{bi,... , &„) Ga corresponds to each 
process identifier A. All the occurrences of b in {vb) P and a{b).P are bound; 
free and bound names of process P are then defined as usual and we denote 
them with fn(P) and bn(P) respectively. 

We define a structural congruence = that identifies all those processes that 
differ only for inessential details in the syntax of the programs. Formally, we 
define = as the smallest congruence that satisfies the following rules: 

P = Q ii P and Q are alpha equivalent 

G-kO = G G+G' = G'+G G+{G'+G") = {G+G')+G" 

P\0 = P P\P' = P'\P P\{P'\P") = {P\P')\P" 

(i/a) 0 = 0 (i/a) (i/b) P = {vb) {va) P {va) {P\Q) = P\{i'ci) Q ii a ^ fn(P) 

The structural congruence is useful to obtain finite state representations for 
classes of processes. In fact, it can be used to garbage-collect terminated pro- 
cesses and unused restrictions. 
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If (T : 9T — > 9T, we denote with Pa the process P whose free names have been 
replaced according to substitution cr (possibly with changes in the bound names 
to avoid name clashing); we denote with {2/i/a:i • • -yn/xn} the substitution that 
maps Xi into yi for i = 1, . . . , n and which is the identity on the other names. 
With some abuse of notation, we can see substitution cr in Pa as a function on 
fn(P) rather than on 91. 

The actions that the processes can perform, are the following: 
a ::= t | a(c) | ab | a(c) 

and are called respectively synchronization, input, free output and hound output 
actions; a and b are free names of a. (fn(a)), whereas c is a bound name (bn(a)); 
moreover n(a) = fn(a) U bn(a). 

The operational semantics of the asynchronous rr-calculus is defined by means 
of a labelled transition systems. The transitions for the ground operational se- 
mantics are defined by the axiom schemata and the inference rules of the table 
on the top of this page. We recall that in the ground semantics no name instan- 
tiation occurs in the input transitions. In [I] it is shown that ground semantics 
coincides with early and late semantics in the case of asynchronous 7r-calculus 
without matching. 



2.1 Asynchronous bisimulation 

In this section we introduce asynchronous bisimulation. As we will see, while the 
management of r and output transitions in the bisimulation game is standard, 
a special clause is needed for the input transitions; different characterizations 
of asynchronous bisimulation are proposed in [1]: they differ just in the way 
input transitions are dealt with. Following [1], we first define OT-bisimulation, 
that just considers output and r transitions; then we consider different notions 
of bisimulations that extend or-bisimulation with a clause for input transitions. 

Definition 1 (or-bisimulation [1]). A symmetric relation TZ on -processes 
is a or-bisimulation if P TZ Q and P P' , where a is not an input transition, 
and bn(a) n fn(P|Q) = 0, imply Q Q' and P' TZ Q' . 
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Notice that clause “bn(a) n fn(P|Q) = 0” in the definition above assures that, 
when a channel name is extruded in a bound output transition, a fresh name 
(i.e., a name that is not used in P or Q) is used to denote that channel. 

In an asynchronous context, messages can be received by a process in any 
moment, even if the process is not ready to consume them: in [5] this intuition is 
modeled by allowing every process to accept every input message, i.e., according 

to the semantics of [5], P P\ab is a valid transition for every process P. 
This approach has some drawbacks; the most important for our purposes is that 
an infinite number of transitions can be performed by every process — even by 
process 0 — so finite state verification is not possible. 

In this paper we follow instead the approach of [1] : an input transition P 
P' corresponds to the consumption of a message, i.e., to the execution of an input 
prefix. However, in the definition of asynchronous bisimulation, we cannot require 

that, given two bisimilar processes P and Q, each input transition P — ^ P' is 

matched by a transition Q Q'-. process Q can receive the message ab without 
consuming it, and be still equivalent to P. In asynchronous bisimulation [1], 

hence, a transition P P' can be matched either by a transition Q Q' , 

and P' and Q' should be still bisimilar; or by a fictitious input transition of Q, 
that receives the message but does not consume it: this is modeled by requiring 
that Q — ^ Q' (i.e., Q performs some internal work), and that P' is bisimilar to 
Q'\ab (process Q'\ab has received the message but has not yet consumed it). 

Definition 2 (ground asynchronous bisimulation [1]). A symmetric re- 
lation TZ on TT-processes is an (ground) asynchronous bisimulation if it is a 

OT -bisimulation such that P TZ Q and P P' with b ^ fn(P|Q) imply 

• either Q Q' and P' TZQ' • or Q Q' and P' TZ {Q'\ab). 

Two processes P and Q are asynchronous bisimilar, written P ~a Q, if P TZ Q 
for some asynchronous bisimulation TZ. 

In [I] some alternative characterizations of asynchronous bisimulation are 
proposed. One of them, namely 3-bisimulation, shows that it is possible to dis- 
cover by only considering the behavior of P whether the input P — ^ P' is 
“redundant” , and to require that only the “non-redundant” input transitions of 
P are matched in Q. The intuition is that an input transition is “redundant” if 
it is immediately followed by the emission of the received message. 

Here we define a variant of 3-bisimulation, that we call 4-bisimulation. Ac- 
cording to it, if process P performs an input P — ^ P' , but it also can perform 
a r transition P — ^ P" such that P' and P"\ab are bisimilar, then the input 
transition is redundant, and should not be matched by an equivalent process Q. 

Definition 3 (4-bisimulation). A symmetric relation TZ on tt - processes is a 

4-bisimulation if it is a or -bisimulation such that P TZ Q and P P' with 
b ^ fn(P|Q) imply 
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• either Q Q' and P' TZQ' • or P — ^ P” and P' TZ (P"|a6). 

Two processes P and Q are 4-bisimilar, written P ~4 Q, if there is some 4~ 
bisimulation TZ such that P TZ Q. 

In our opinion 4-bisimulation is particularly interesting: each process can discover 
privately if a transition is redundant, and in when two transitions of different 
processes are matched, it is required that the labels are exactly the same. 

Proposition 1. Relations ~a and ~4 coincide. 

3 “Hot-potato” bisimulation 

Asynchronous bisimulation and its alternative characterizations discussed in the 
previous section are not amenable for finite state verification. In fact, infinite 
state systems are obtained for essentially all the interesting processes that can 
perform infinite computations. This happens since the messages generated dur- 
ing a computation are not forced to be emitted, even if their channels are not 
restricted; rather, they are simply put in parallel to the process. So, every pro- 
cess that continues to generate output messages, gives rise to an infinite state 
system. 

We define now “hot-potato” bisimulation, that avoids this source of infinite- 
ness. The key idea is to force the output particles to be emitted as soon as 
possible: consider process 

P = {uc) {ue) (ac|6c|c(i|e/|G). 

Output particles ac and be can be emitted directly. Particle cd can be emitted 
only after name c has been extruded by the emission of ac or of be. Particle 
e/, finally, cannot be fired, since name e is restricted and there are no output 
particles that extrude it. In what follows, whenever we need to identify the Arable 
output particles of a process P we use the notation P = F <\P' , where F contains 
the Arable output particles and the restrictions that are extruded by them, while 
P' contains the blocked output particles and the control part. So, for instance, 
process P can be decomposed as follows: 

P = {uc) (ac|6c|cd) < {ue) (e/|G). 

Up to structural congruence =, the decomposition of P into F and P' is unique. 

In hot-potato bisimulation the emission of a message takes precedence on 
input and synchronization transitions; that is, process P cannot perform any 
input or synchronization transition until messages ac, be and cd have been emit- 
ted. Moreover, rather than performing the emission of the output particles in a 
sequential way, the whole Arable output F of F <J P is emitted in one step. 

Definition 4 (hp-bisimulation). A symmetric relation TZ on tt - processes is a 
hot-potato bisimulation ( or hp-bisimulation ) if P TZ Q and P = F < P' with 
bn(U) n fn(P|(5) = 0 then Q = F <Q' and 
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- if P' ^ P” then Q' Q" and P” Q” ; 

- if P' ^ P" and b fn(P'|Q') then: 

• either Q' ^ Q” and P” 7^ Q” • or P' ^ P"' and P" 7^ {P"'\db). 

Two processes P and Q are hp-bisimilar, written P ^hp Q, if there is some 
hp-bisimulation TZ such that P TZ Q. 



Theorem 1. Relations ~a and ^hp coincide. 

4 History dependent transition systems 

In this section we introduce a new operational model, the History Dependent 
transition systems with Negative transitions, HDN in brief; they are, in our 
opinion, more adequate than classical labelled transition systems for dealing 
with process calculi with name passing, like the asynchronous tr-calculus. 

As we have explained in the Introduction, classical labelled transition sys- 
tems have difficulties in modelling the creation of fresh names: for instance, in 
the ordinary operational semantics of 7r-calculus, infinite bunches of transitions 
are necessary. This problem is addressed by HDN, where states and labels are 
enriched with sets of names, that are now an explicit component of the model. 
Moreover each state of a HDN is used to denote a whole family of 7r-processes 
that differ for injective renamings, and a single transition is sufficient to model 
the creation of a fresh name. This is obtained by representing explicitly the cor- 
respondence between the names of source, label and target of each transition; in 
the ordinary labelled transition system, the correspondence between these names 
is the syntactical identity, and this requires to distinguish states and transitions 
that differ for the syntactical identity of the names. 

All these features are also present in HD [8] . The original element of HDN 
is the presence of negative transitions: these are used to determine whether a 
transition is redundant or not. The intuition is that a transition is redundant 
if there is a negative transition from the same state, with the same label, and 
such that the two target states are bisimilar. That is, a negative transition from 
a state cancels the “equivalent” positive transitions from that state. 

Definition 5 (HDN). A History Dependent transition system with Negative 
transitions, or HDN, is a tuple A = {Q,C,iJ,,\ >, ■ — ->) where: 

— Q is a set of states and L is a set of labels; we assume that Q n £ = 0; 

/X : £ U Q ^ ’Pfin(9I) associates to each state and label a finite set of names; 

— I > is the (positive) transition relation and - — -> is the negative transition 

relation; if Q I ^ Q' (resp. Q Q' ) then: 

• Q,Q' € Q are the source and target states, 

• X G C is the label, 

• o- : n{Q') ^ n{Q) U /x(A) is an injective embedding of the names of the 
target state into the names of the source state and of the label. 
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We assume that the set of labels is closed for injective renamings, i.e., for each 
label X G C and each injective renaming p : /x(A) ^ — > 9T, we assume that a 
label Xp G L is defined. The following properties of renamings on labels must be 
satisfied: n{Xp) = p{fj,{X)), {Xp)p' = A(p; p'), and Xp = X if p = id^(A) • 

4.1 A HDN for the asynchronous 7r-calculus 

In this section we define the HDN II corresponding to the asynchronous tt- 
calculus; the “hot potato” semantics is exploited to this purpose. 

In this case, the states Qn have two forms: they are (0,P) and in 

a state of the form (0, P) the emission of the output message has still to be 
performed, while in a state (1,H) it has already happened and process P can 
perform input and synchronization transitions. In both cases, the names associ- 
ated to the state are fn(P). 

In n all the 7r-processes that differ only for an injective renaming are col- 
lapsed into a single state. To this purpose, we assume to have canonical repre- 
sentatives for each class of processes that differ for injective renamings, and a 
function norm that, given a process P, returns a pair norm(P) = (Q, ct), where 
Q is the canonical representative of the class of processes that differ from P for 
an injective renaming, and cr : fn((5) ^ — > fn(P) is the injective renaming such 
that P = Qa. 

The transitions in II from a state (1,T*) correspond to the synchronization 
and input actions of process P. While all the r transitions of P have to be 
represented in 7T, it is not necessary to take all the input transitions; rather, 
it is sufficient to take just one canonical representative for each bunch of input 
transitions. In this case, a policy for allocating the fresh names has to be chosen. 
Since 91 is countable, we can take the first name that does not already appear in 
process P whenever a transition from P requires the generation of a fresh name. 

So, we say that transition P P' is canonical if 6 = min(91 \ fn(P)). 

Whenever a process P can perform both an input transition P — ^ P' and a 
T transition P — ^ P” , we have to take into account that the input transition is 
redundant if P' and P"\ab are bisimilar. To this purpose, a negative transition 
with label a{b) is added to II. 

In n there is exactly one transition from state (0, P), that corresponds to 
the emission of the firable messages. If P = P < P', then P is observed as the 
label of the transition. Since component P of a process P is unique only up 
to structural congruence, we assume to have canonical representatives for these 
composed output messages, and we call P = P < P' a canonical decomposition 
if P is a canonical representative. 

Notice that the names /i(P) that correspond to label P are not only the free 
names of P, but also its restricted ones. So, if the injective substitution p is 
applied to P, not only the free names are changed according to p, but also the 
restricted ones. 
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Definition 6 (HDN for the asynchronous 7r-calculus). The HDN II for 

the “hot potato” asynchronous tt - calculus is defined as follows: 

— Qn = {(0,-P) \ P is a canonical TT-process} U {(1,P) \ P is a canonical 
TT-process without firable messages} and /i((0,P)) = /x((l,P)) = fn(P); 

— £n = {t} U {a{b) \ a,b G U {F \ F is canonical} and h{t) = 0, ii{a{b)) = 
{a, b}, n{F) = fn(F) U bn(F); 

— if (0,(5) € Qn, Q = F <\Q' is a canonical decomposition, and norm((5') = 
{Q”,a), then {0,Q)\—^n (1,(5")/ 

— if (1,(5) G Qn, Q Q' is a transition, and norm((5') = {Q'',a), then 

(l,Q)^fj (0,Q"); 

— if{l,Q) € Qn, Q Q' is a canonical transition, andnorm{Q') = (Q",a), 
then {l,Q)y^fj (0,g"); 

— if (1,Q) G Qn, Q Q' is a canonical transition, Q — ^ Q" is a transition, 

and norm(g"|a6) = {Q'”,cr), then (1,Q) (0,Q"'). 

Definition 7 (finitary processes). Let P be an asynchronous tt - process and 
let norm(P) = (Q,a). Process P is finitary if and only if a finite number of 
states in II are reachable from (0,Q). 

4.2 HDN-bisimulation 

In this section we define bisimulation on HDN. In this case, bisimulations cannot 
simply be relations on the states; they must also deal with name correspondences: 
a HDN-bisimulation is a set of triples of the form {Qi,S, Q 2 ) where Qi and Q 2 are 
states of the HDN and (5 is a partial bijection between the names of the states. 
The bijection is partial since bisimilar states of a HDN can have a different 
number of names (in fact, bisimilar 7r-processes can have different sets of free 
names). 

Notation 1. We represent with f : A ^ ^ B a partial bijection from set A to 
set B and with f : A < — > B a total bijection from set A to set B. We denote 
with f;g the concatenation of f and g and with f~^ the inverse of f. 

Suppose that we want to check if states Qi and Q 2 are bisimilar via the partial 
bijection S : n{Qi) ^ ^ l^{Q 2 ) and suppose that Q\ can perform a transition 
Qi I Q'l - There are two alternatives: 

— State Q 2 matches the transition of Q\ with a transition Q 2 I Q'^ such 

that labels Ai and A 2 coincide up to a bijective renaming p, and states Q} and 
Q '2 are still bisimilar via a partial bijection 5' . Clearly, name correspondences 
5, p and 5' have to be related. More precisely, p has to coincide with 6 on 
the names that appear both in the label and in the source state (in fact, p 
is used to extend 5 to the fresh names that are introduced in the transition) 
and all the pairs of names that appear in 5' must appear, via the embeddings 
a I and (J 2 , either in <5 or in p. 
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— Transition Qi I Q'^ is redundant, i.e., there is some negative transition 

Qi Q'l such that labels Ai and coincide up to a bijective renaming 

p, and states Qi and Q'( are bisimilar via a partial bijection S' . Also in this 
case, name correspondences id^(Qj), p and S' are related. 

Definition 8 (redundant transitions). Let TZ be a symmetric set of triples 
on HDN A. Transition Q\ I is redundant for TZ, written Qi I 

Q'l G red[ 7 ^], if there exists some negative transition Q\ Q" and some 

p : /i(Ai) < — > M(Ai) such that 

- pD iniQi) X fiiQi)) = id^(Qi) n(/x(Ai) x /x(Ai)); 

~ A'l = Aip; 

- (Q'i,S',Q'{) G TZ for some S' C ai; (id^(Qi) Up);cr'i~\ 

If transition Qi Qi is not redundant for TZ, then we say that it is non- 

redundant for TZ and we write Qi ^ red[ 7 ^]. 

Definition 9 (HDN-bisimulation). A symmetric set of triples TZ on HDN 
A is a HDN-bisimulation if (Qi,i 5 , Q2) G TZ implies that for each transition 
Qi ^ red[ 7 ^] there exists some transition Q2 I Q2 some 

p : /x(Ai) < — > m(A2) such that 

- pn (m(Qi) X m(Q 2)) = n (/x(Ai) X /x(A2)); 

— A2 = Aip; 

— {Q'l, S', Q'2) G TZ for some S' C ai; (S U p); 

Proposition 2. If TZi with i G I are HDN -bisimulations for some HDN then 
also TZi is a HDN-bisimulation. 

This proposition guarantees the existence of the largest bisimulation for a HDN 
A. We denote with this largest HDN-bisimulation. Moreover, if {Qi,S, Qf) G 
then we say that states Qi and Q2 are HDN-bisimilar according to S. 

The following theorem shows that HDN-bisimulation on H captures exactly 
asynchronous bisimulation. 

Theorem 2. Let Pi and P2 be two n-processes and let norm(Pi) = (Qi,ai) 
and norm(P2) = {Q2,o'2). Then Pi ~a P2 if and only if {0,Qi) and ( 0 ,Q 2 ) are 
HDN-bisimilar in H according to cri;<T^^. 

4.3 Iterative characterization of HDN-bisimulation 

In this section we show that, for a class of finite state HDN, the largest bisim- 
ulation can be effectively built with an iterative algorithm that resembles the 
partition refinement techniques of classical labelled transition systems [ 9 ] . 

As a first step, we characterize HDN-bisimulations on HDN A as the pre-fixed 
points of a monotone functor (La ■ 
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Definition 10 (functor <Pa)- Functor <Pa on symmetric set of triples TZ on 
HDN A is defined as follows: Q 2 ) G if and only if for each 

Qi I ^ red[ 7 ^] there exists some transition Q2 I Q2 and some 

p : /x(Ai) < — > m(A2) such that 

— p n (m(Qi) X h{Q2)) = n (m(Ai) x /x(A2)); 

— A2 = Aip; 

— {Q'i,5' ,Q' 2 ) G TZ where S' C ap, (S U 

Fact 1. Set of triples TZ is a HDN -bisimulation for A if and only if it is a 
pre- fixed point of functor <Pa ■ 

Lemma 1. Functor <P a is monotone. Moreover, if the HDN A is finite branch- 
ing, functor <T>a is continuous. 

The fact that functor (1>a is continuous for finite branching HDN (and hence 
in particular for finite state HDN), guarantees that the largest bisimulation ^a 
can be obtained by the iterated application of <1^a starting from the universal 
set of triples {(Qi,i 5 , Q2) | Qi,Q2 & Q, S : /x(Qi) ^/i(Q2)}- 

Corollary 1. Let A be a finite branching HDN. Then ^a = flneiN 

This result can be exploited to obtain an algorithm that builds ^a whenever 
A is a finite state HDN. However, this approach is not very efficient, since it 
involves the manipulation of large sets of triples: even in the case ^a is very 
small, the algorithm starts from a set of triples Ua that is very large. 

A similar situation also happens in the case of bisimulation for ordinary 
finite state transition systems: all the states are considered equivalent in the 
beginning, and this universal relation is refined by the repeated application of 
a functor. In that case, however, all the approximations built by the algorithm 
are equivalences and can be efficiently represented by partitions of the states 
in equivalence blocks. So, for instance, the initial relation is represented in a 
compact way by the singleton partition, where all the states are in the same 
block. 

To develop an efficient algorithm for HDN-bisimulation, it would be impor- 
tant to apply partitioning- like techniques also in this context. Unfortunately, 
in general the approximations T>'\{Ua), and in particular the largest HDN- 
bisimulation are not transitively closed. Consider if fact the following very 
simple HDN A, where no names are associated to states and labels: 



P Q R 




It holds that {P, 0, Q) G ^a (since P and Q have the same positive transitions) 
and that (Q,l!),R) G ^a (since the only positive transition of Q is clearly redun- 
dant). It is not true, however, that {P, 0, R) G since R is not able to match 
the positive transition of P. 
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These problems occur since in the definition of , as well as in the definition 
of HDN-bisimulation, it is not required that a non-redundant transition of Q\ is 
matched by a non-redundant transition of Q2- Now we define functor where 
this correspondence between non-redundant transitions is forced. Therefore, the 
approximations obtained by iterating functor tfbi are transitively closed. How- 
ever, this functor differs from in general and, even worse, it is non-monotone 
(so there is no guarantee that the approximations will ever converge). 

Definition 11 (functor tfbi)- Functor tfM on a symmetric set of triples TZ 
is defined as follows: (( 5 i,i 5 , Q2) G if o,nd only if for each Qi 

Q'l ^ red[ 7 ?.] there exists some transition Qi I Q 2 ^ red[ 7 ^] and some 

p : /x(Ai) < — > m(A2) such that 

— p n (/i(Qi) X h{Q2)) = n (m(Ai) X /x(A2)); 

— A2 = Aip; 

— {Q'i,S',Q2) G TZ where S' C ap, {6 U p);af^. 

Consider HDN A on the previous page and let TZ = {(P', 0 , Q')} and S = 
{(P', 0 , Q'), (Q', 0 ,< 5 ")}. Then clearly TZ C S. However, {P,%,Q) G 'Fj^fTZ) but 
(P,^,Q) ^ Fji{S), so Fji{TZ) % Fji{S). In the case of HDN A, therefore, functor 
P4 is not monotone. 

While in general different, there are classes of finite branching HDN on which 
functors < 1 > and F compute the same sequence of approximations. This situation 
is very convenient, since in this case the advantages of both functors hold; that 
is, the functor is continuous, so that the largest HDN-bisimulation is captured by 
iterating it; and the approximations are transitively closed, which implies that 
also the largest bisimulation is transitively closed. Fortunately enough, all the in- 
teresting HDN that we have considered are redundancy-consistent. In particular, 
this is the case of II . 

Definition 12 (redundancy-consistent HDN). The finite branching HDN 
A is redundancy-consistent if a) — 'F^{Ua), for all n gJN. 

Proposition 3. All the approximations TZ = P^fUA) of a redundancy-consis- 
tent HDN A are transitively closed, i.e., {Qi,Si2,Q2) G TZ and (Q2,S23,Q3) G TZ 
imply {Qi, {S12; i523), Qs) G IZ. 

Theorem 3. The HDN H is redundancy- consistent. 

Each transitively closed set of triples TZ induces a partition of the states in 
equivalence classes. However, to characterize TZ it is still necessary to represent 
all the name correspondences between all the pairs of states in the same block. 
Now we show that these correspondences can be represented in a compact way 
by exploiting active names. At every step of the iteration of functor Ta there are 
names of a state that have played no active roles in the game of matching transi- 
tions, since they are not appeared yet in the labels of the transitions considered 
for that state. Therefore, any correspondence can exist between the “inactive” 
names of equivalent states. 
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Definition 13 (active names). Let A be a HDN. The family of functions 
an^ : Q — > with n G IN, is defined as follows: a,n^{Q) = 0, and 

anf+\Q)= IJ (/x(A)Ua(an:^(Q')))nMQ)- 

{q h^'^Q^red[a'"(Wyi)]} 

Notice that only the transitions that are non-redundant w.r.t. are con- 

sidered for computing the active names at the (n-l-l)-th step: only those transi- 
tions, in fact, are considered in the (n-l-l)-th application of functor Also, the 
intersection — H /i(Q) is necessary, since a transition can introduce new names 
that do not appear in the source state. 

The following proposition expresses the important properties of active names: 
any name correspondence between two equivalent states is a total correspondence 
between the active names of the two states; moreover, any correspondence is 
possible between the non-active names of two equivalent states. 

Proposition 4. Let A he a redundancy- consistent HDN. Then: 

1. if {P, S, Q) € T2(ldA) then <5n (an2^(P) x an^((5)) is a total bijection between 
an^(P) and an^(Q); 

2. if {P, 5, Q) G and (5n (an^(P) x an^(Q)) = i5'n (an^(P) x an^(Q)) 

then {P,6',Q) €W^{Ua). 

We can exploit the properties of active names to obtain a more compact 
representation of ^'^^{Ua)' only the correspondences of the active names are ex- 
plicitly represented for each pair of states in the same equivalence class. There 
are cases in which the introduction of active names leads to a dramatic reduction 
of the correspondences that have to be represented explicitly. An extreme ex- 
ample is the universal relation Ua '■ while all the name correspondences between 
each pair of states appear in Ua, none of them has to be represented explicitly, 
since no name is active at this point. 

Also in the cases where a large number of correspondences exist between two 
equivalent states, a compact representation can be found for them. In fact, let 
^a(P,Q) be the set of name correspondences that exist, according to 'L'f^{UA), 
between the active names of P and the active names of Q: 

Af,{P,Q) = n (an:^(P) x an:^(g)) | {P,S,Q) G nm}. 

The following proposition shows that A'^(Q,Q) is a permutation group on the 
active names of Q; it is hence sufficient to represent it by means of a set of gen- 
erators. Moreover A’f^{P,Q) can be recovered, starting from any of its elements 
S, by composing S with all the elements of Af^{Q,Q); it is hence sufficient to 
represent explicitly only one element of Af^{P,Q). 

Proposition 5. Let A be a redundancy- consistent HDN. Then: 

1. if Af^{Q,Q) is a permutation group on anf^{Q); 

2. if5£A^{P,Q) then Af^{P,Q) = {<5; ,5' | ,5' G Af^{Q,Q)}. 
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1 Normalize processes Pi and P 2 . Let = norm(Pi) for i = 1,2. 

2 Generate the part of the HDN U that is reachable from (0, Qi) and (0, Q 2 )- 

3 Initialization: 

3.1 For each (reachable) state Q of II , initialize an[Q] to the empty set. 

3.2 Initialize part to the singleton partition on the (reachable) states of II . 

3.3 For each pair of (reachable) states Q and Q' , initialize Delta[(3, Q'] to the 
empty set of name relations. 

4 Repeat the following steps until partition part becomes stable: 

4.1 Compute the non-redundant transitions according to part. 

4.2 Update the sets of active names an[Q] for all the states Q. 

4.3 Refine part according to functor I'n- For each pair of states Q and Q' 
that are still in the same block of part, put in Delta[Q, Q'] (a compact 
representation of) the valid relations between an[Q] and EuifQ']. 

5 Check if Qi and Q 2 are in the same class and if is in Delta[Qi, Q 2 ]- 



We are currently working on the implementation of an algorithm that exploits 
these techniques to check bisimilarity of finitary asynchronous 7r-processes. In 
the table above we sketch the main steps that have to be performed to check 
whether processes Pi and P 2 are equivalent. We plan to integrate it within HAL, 
a verification environment for calculi with name passing [3] . 

We conclude this section with some comments on the complexity of the al- 
gorithm. It is not possible, in general, to find an upper bound for the number 
of states and transitions of the HDN corresponding to a finitary 7r-process P in 
function of the syntactical length of P: in fact this problem is equivalent to find 
an upper bound to the length of the tape used by a given Turing machine, which 
is an undecidable problem. Once the HDN is built, the complexity in time for 
building the largest HDN-bisimulation is polynomial in the number s of states 
and t of transitions of the HDN, and exponential in the maximum number n of 
the names that appear in the states. The polynomial complexity in s and t is 
typical of the partitioning algorithms: each iteration of step 4 of the algorithm 
refines the partition of states, and at most s — 1 refinements are possible, after 
that all the states are in different blocks. However, the algorithm has to deal 
with correspondences between names, and there can be up to of those 

correspondences between two states, hence the algorithm is exponential in n. 
Even if these correspondences are represented in a compact way by means of 
permutation groups, the exponential in the number of names cannot be avoided: 
some of the operations on permutation groups used in the algorithm are in fact 
exponential in the number n of elements. 

5 Concluding remarks 

In this paper we have presented the model of history dependent transition sys- 
tems with negative transitions (HDN). They are an extended version of labelled 
transition systems and are adequate for asynchronous calculi with name passing. 
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We have also defined a finitary characterization of bisimilarity for the 7r-calculus; 
this characterization can be modeled by HDN and, as a consequence, finite state 
representations can be computed for a family of 7r-processes. 

In this paper we have considered only the asynchronous 7r-calculus without 
matching. In [10], however, HDN are applied also to the asynchronous rr-calculus 
with matching^ as well as to the early, late [7], and open [13,11] semantics of the 
TT-calculus with synchronous communications. 

We are also working to extend the approach described in this paper to the 
weak asynchronous bisimulation. The alternative characterization given by 4- 
bisimulation works also for the weak semantics: it is sufficient to replace the 
strong transitions — > with weak transitions =^. Unfortunately, weak hot- 
potato bisimulation does not coincide with weak asynchronous bisimulation: it 
is not safe to force weak outputs to be emitted as soon as they are ready, since 
in this case the firing of an output can discard possible behaviors. For instance, 
in process T.ab + a(c).0 the input transition is not performed at all if the output 

transition r.a5-|-a(c).0 0 has the precedence. To apply successfully the HDN 

also to the weak asynchronous 7r-calculus it is necessary to find conditions that 
allow a weak output transition to be fired without discarding behaviors. 
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Abstract. The aim of this work is to investigate mechanical support for process 
algebra, both for concrete applications and theoretical properties. Two approaches 
are presented using the verification system PVS. One approach declares process 
terms as an uninterpreted type and specifies equality on terms by axioms. This is 
convenient for concrete applications where the rewrite mechanisms of PVS can 
be exploited. For the verification of theoretical results, often induction principles 
are needed. They are provided by the second approach where process terms are 
defined as an abstract datatype with a separate equivalence relation. 



1 Introduction 

We investigate the possibilities of obtaining mechanical support for equational reason- 
ing in process algebra. In particular, we consider ACP-style process algebras [2,3], 
where processes are represented by terms constructed from atoms (denoting atomic 
actions) and operators such as choice (non-determinism), sequential composition, and 
parallel composition. Axioms specify which process terms are considered to be equal. 

The idea is to apply equational reasoning to processes, similar to normal arithmetic. 
This reasoning is often very tedious and error-prone, and it is difficult to check all 
details manually. Especially concurrency, which is usually unfolded into a sequential 
term representing all interleavings, might generate large and complex terms. Hence, the 
quest for convenient proof support for process algebra. We investigate two aspects: 

- Mechanical support for the verification of concrete applications. The aim is usually 
to verify that an implementation satisfies a specification. Both are expressed in pro- 
cess algebra, where the implementation is more detailed with additional (internal) 
actions. The goal is to show that the specification equals the implementation after 
the abstraction from internal actions. The proof proceeds by rewriting the imple- 
mentation using the axioms until the specification is obtained. 

- Mechanical support for the proof of theoretical properties of a process algebra. A 
common proof technique is based on so-called elimination theorems. Such a theo- 
rem states that any closed process term in a given process algebra can be rewritten 
into a basic term, i.e. a term consisting of only atoms, choices, and atom-prefixes 
(resfricted sequential composition). Thus a property for general process terms can 
be reduced into one for basic terms, which can then be proved by induction on the 
structure or the length of basic terms. 

Since our goal is to reason about recursive, possibly infinite, processes and to verify 
not only concrete applications, but also general theoretical results, we do not aim at 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 270-284, 1999. 
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completely automatic verification. In this paper, we investigate how process algebra can 
be incorporated in the framework of the tool PVS (Prototype Verification System) [16]. 
Properties can be proved in PVS by means of an interactive proof checker. This means 
that the user applies proof commands to simplify the goal that must be proven, until it 
can be proved automatically by the powerful decision procedures of the tool. 

We experiment with two different definitions of process algebra in the specification 
language of PVS, a typed higher-order logic. One possibility is to define process terms 
by means of the abstract-datatype mechanism of PVS which generates, among others, a 
useful induction scheme for the datatype, allowing induction on the structure of terms. 
As an alternative, we investigate how the rewriting mechanisms of PVS can be exploited 
for equational reasoning. Since process algebra, as a method for specifying and verify- 
ing complex systems, is still under development, many different algebras already exist 
and others are still being designed. Therefore, the goal is to create a flexible framework 
in PVS that allows experiments with tool support for customized process algebras. 

Related Work. A lot of effort has been devoted to the development of dedicated 
tools for process algebra. For PSF [13], an extension of ACP with abstract datatypes, 
tools are available that mainly support specification and simulation. PAM [12] is a re- 
lated tool which provides flexible language support. Another class of dedicated tools 
aims at automatic verification, including bisimulation and model checkers. An example 
is the Concurrency Factory [8]. 

More related to our work is research on the use of general purpose proof check- 
ers. E.g., tool support for CCS and CSP has been obtained using HOL [6,7,15]. This 
theorem prover has also been used to get mechanized support for reasoning with the 
TT-calculus [14]. For pCRL, an ACP-like language with data structures, both Coq [5,11] 
and PVS [10] have been investigated. In [5] pure algebraic reasoning is used, whereas 
the work described in [10,1 1] combines algebraic and assertional reasoning. 

Most of the research mentioned above aims at concrete applications. The only sup- 
port for the verification of theoretical issues concerns the soundness proof of algebraic 
axioms, based on a specihc semantic model [6,14,15]. Whereas this often concerns the- 
ory about the underlying model, we are more interested in the verihcation of theoretical 
results on the axiomatic level, without relying on any underlying model. 

Also different is that we explicitly study the choices that can be made when in- 
corporating process algebra in a general purpose proof checker. In that respect, our 
work is probably most related to research on tool support for a CSP-like algebra by 
means of HOL [9]. In fact, they investigate similar approaches as we do, although they 
only consider small concrete examples. New in our paper is, besides the verihcation 
of non-trivial examples, that we additionally show how to obtain proof support for the 
development of ACP-style theory on the axiomatic level. 

Overview. In Section 2, we investigate two alternatives for the modeling of process 
algebra in PVS. An approach where process terms are dehned as an abstract datatype, 
with a separate equivalence relation on terms, is presented in Section 3. It is used to 
prove a number of theoretical results, using induction schemes provided by PVS. Sec- 
tion 4 describes an alternative approach where process terms are dehned as an unin- 
terpreted type, allowing convenient rewriting of concrete process terms. Concluding 
remarks can be found in Section 5. 
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2 Modeling Process Algebra in PVS 

We discuss two approaches to defining process algebra in PVS. First, in Section 2.1, we 
briefly introduce the process-algebraic framework considered in this paper. A straight- 
forward formulation in PVS, using uninterpreted types plus equality, is presented in 
Section 2.2. An approach where terms are defined as an abstract datatype is described 
in Section 2.3. 

2.1 Process Algebra 

To illustrate the main concepts, we consider theory PA (Process Algebra), as defined 
in [2,3]. This theory is presented in Table 1, where parameter A represents the set of 
atoms. The first entry of this table specifies the sorts; P is the sort of all process terms. 

The second entry lists the standard algebraic operators; choice, denoted C, sequen- 
tial composition, denoted • , parallel composition or merge, denoted |j, and an auxiliary 
operator called the left merge, denoted [[, which is used to axiomatize the merge. Intu- 
itively, the left merge corresponds to parallel execution, with the restriction that the left 
process executes the first action. 

The third entry of Table 1 contains the axioms. For instance. Axiom AA specifies 
right-distributivity of sequential composition over choice. The absence of left-distribu- 
tivity implies that processes with different moments of choice are distinguished. The 
axioms define an equivalence relation on processes. A model of these axioms, thereby 
showing their consistency, consists of equivalence classes of closed terms (i.e. terms 
without variables) as processes, with bisimulation as the equivalence relation. Note, 
however, that this is only one possible model. A strong point of axiomatic reasoning is 
that it is model independent. 

Standard for equational specifications are general substitution and context rules 
which express that a process can be replaced by an equivalent term in any context, 
i.e., inside any term. 

PA>A< 

TT A C P 

-C -II -li - : PxP^P 
a : AT x,y,z : PI 



X C y V y C X 
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x\\yVxlyCyW_x 
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a ]\_x V a ■ X 
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a ■ X "^y V a ■ \>x \\ y<i 


M3 


t>xCy<-zT>x-zCy-z 


AA 


t>xCy<^zVx^zCy^z 


MA 


\>x ■ y<- zV X -\>y ■ z< 


A5 







Table 1. The process algebra PA. 



2.2 Using Uninterpreted Types plus Equality 

In PVS theory PArew, we model process algebra PA with the intention to exploit the 
rewriting mechanisms of PVS. Theory PArew is parameterized by the type Atoms. Pro- 
cess terms are just defined as some non-empty uninterpreted type, assuming a function 
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trm which maps atoms into terms. This function is defined as a conversion in PVS, 
which means that it need not be mentioned explicitly. 

PArew [Atoms : NONEMPTY_TYPE] : THEORY 
BEGIN 

Terms : NDNEMPTY_TYPE 
trm : [Atoms -> Terms] 

CONVERSION trm 

Next we define the operators as functions in the language of PVS and axiomatize equal- 
ity on terms, using the built-in equality on uninterpreted types. Frequently, atoms are 
interpreted as terms using conversion trm. E.g., a o x is interpreted as trm (a) o x. 
Moreover, note that o binds stronger than / / which binds stronger than +. 

+, o, //, Imrg : [Terms, Terms -> Terms] 

a : VAR Atoms 

X, y, z : VAR Terms 



Al 


AXIOM 


X + y 


= 


y + x 


A2 


AXIOM 


(x + y) + z 


= 


X + (y + z) 


A3 


AXIOM 


X + X 


= 


X 


A4 


AXIOM 


(x + y) 0 z 


= 


X 0 z + y o z 


A5 


AXIOM 


(x 0 y) 0 z 


= 


X 0 (y 0 z) 


Ml 


AXIOM 


X // y 


= 


Imrg(x,y) + Imrg(y,x) 


M2 


AXIOM 


Imrg(a,x) 


= 


a 0 X 


M3 


AXIOM 


Imrg (a o x,y) 


= 


a 0 (x // y) 


M4 


AXIOM 


ImrgCx + y,z) 


= 


ImrgCx, z) + Imrg(y,z) 



END PArew 

In general, one should be careful with axioms in PVS, because they might introduce 
inconsistencies. However, as mentioned in Section 2.1, there are several models satisfy- 
ing the above axioms, showing that they are consistent. For the time being, we did not 
formalize a model in PVS, since our main interest concerns proof support for ACP-style 
axiomatic reasoning. When using PVS for a customized process algebra, its consistency 
must of course be shown by providing a model. 

As a simple application of this theory, we present theory PArewex which imports 
PArew. The theorem called expand shows the equivalence of a parallel process and a 
sequential term, representing all interleavings. This theorem can be proved automati- 
cally in PVS after installing automatic rewrites on all axioms except Al. 

PArewex : THEORY 
BEGIN 

Atoms : TYPE = fa.b.c.d} 

IMPORTING PArew [Atoms] 

expand : THEOREM (a+b) o (a+b) // (c+d) = 

a o (a o (c + d) + b o (c + d) + (c o (a + b) + d o (a + b))) + 

b o (a o (c + d) + b o (c + d) + (c o (a + b) + d o (a + b))) + 

c o (a o (a + b) + b o (a + b) ) + 

d o (a o (a + b) + b o (a + b) ) 

END PArewex 

In Section 4, we illustrate this approach by a more complex process algebra and a non- 
trivial example. However, this framework is not suitable for proving theoretical results, 
based on inductive proofs. 
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2.3 Defining Process- Algebra Terms as an Abstract Datatype 

Proofs about properties of process algebra often use induction on the structure of terms. 
Since PVS generates such induction schemes for abstract datatypes, it seems convenient 
to model process terms as an abstract datatype. Hence we present an approach in which 
the terms of PA are represented as an abstract datatype with type Atoms as a parameter. 
The datatype below contains five so-called constructors', atm to turn atoms into terms, 
and four operators o, +, //, and Imrg for, resp., sequential composition, choice, merge 
and left merge. 

PA_terms [Atoms: TYPE] : DATATYPE 
BEGIN atm(at: Atoms) 

o(sql, sq2: PA_terms) 

+(chl, ch2: PA_terms) 

//(mrgl, mrg2: PA_terms) 

ImrgClmrgl, lmrg2: PA_terms) 

END PA_terms 

When type checking this datatype definition, the PVS system generates a new hie which 
contains a large number of useful dehnitions and properties of the datatype. E.g., a 
subterm relation x << y is dehned, with an axiom to express that it is well-founded. 

PA_terms_well_f ounded: AXIOM well_f ounded? [PA_terms] («) ; 

Moreover, an induction scheme is generated, expressing that a property p on terms can 
be proved by showing that it holds for all atoms and by proving that it holds for the 
other operators if the subterms already satisfy p. 

Defining an Equivalence Relation on Terms. Observe that for terms that are dehned 
as an abstract datatype, equality has a hxed meaning in PVS, namely syntactic equality. 
Hence, equality cannot he used to express equivalence of process terms, as we did in 
Section 2.2. Therefore, we dehne in PVS theory PA a separate equivalence relation, de- 
noted ==, on PA terms, using a pre-dehned predicate equivalence? which implies that 
the relation is rehexive, symmetric, and transitive. As before, this relation is specihed 
by the axioms of PA. 

PA [Atoms: NDNEMPTY_TYPE] : THEORY 
BEGIN 

IMPORTING PA_terms [Atoms] 

== : (equivalence? [PA_terms] ) 
a, b, c : VAR Atoms 

V, w, X, y, z : VAR PA_terms 

Al: AXIOM X + y == y + X 

M4: AXIOM Imrg(x + y, z) == Imrg(x, z) + Imrg(y, z) 

Henceforth, we omit variable declarations if they have been presented in earlier theories. 

Standard for equational reasoning is that equivalent terms can be substituted by one 
another in contexts. Unfortunately, in the current framework, this has to he expressed 
explicitly as follows. 

ch_I: AXIOM x == z IMPLIES x + y == z + y 
mrg_I: AXIOM x == z IMPLIES x // y == z // y 



atom? 

seq? 

choice? 

merge? 

Imerge? 
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Due to the possibility to do inductive proofs, the approach of the current subsection 
provides a more powerful framework than the one of Section 2.2. Therefore, we first 
study the applicability of the approach with abstract datatypes in the next section. 



3 Abstract Data Types plus Equivalence Relation 

The most interesting aspect of the use of abstract datatypes for process terms is that it 
allows inductive proofs. As mentioned in the introduction, inductive proofs are often 
used in a proof technique based on basic terms and an elimination theorem. Section 3.1 
briefly explains this proof technique. Section 3.2 contains a formulation of basic terms 
in PVS. In Section 3.3, we prove in PVS that each closed PA term can be translated 
into an equivalent basic term. This translation is used in Section 3.4 to prove properties 
about the alphabet of a process term and in Section 3.5 to show associativity of the 
merge operator. 



3.1 Basic Terms and Elimination 

For a convenient treatment of realistic examples, most process algebras contain a large 
number of operators and axioms. This, however, complicates the proof of general prop- 
erties about the algebra. Hence, it is extremely useful if one can show that many oper- 
ators can be eliminated and any term can be reduced to an equivalent term with only a 
few basic operators. In the framework of [2,3], this leads to the concept of basic terms. 

Definition (Basic terms). The set of basic terms is inductively defined as follows. The 
atoms A are contained in the set of basic terms. Furthermore, for any a G A and basic 
terms s, t, also a ■ t and sC t are basic terms. No other terms are basic terms. 

It can be shown that any closed PA term can be translated into an equivalent basic term. 

Theorem (Elimination). For any closed PA term p, there exists a basic term t, such 
that pT> t can be derived from the axioms of PA. 

A standard proof technique for a property on process terms, is to reduce it by the elim- 
ination theorem to a property on basic terms which is then proved by induction on the 
structure of basic terms (see Section 3.4) or the length of basic terms (see Section 3.5). 
This axiomatic reasoning is model independent and hence the property holds in any 
model based on closed terms as processes. 



3.2 Defining Basic Terms in PVS 

To dehne basic terms, we extend theory PA of Section 2.3 with a predicate basic? on 
the abstract datatype PA_terms. This predicate is defined recursively on the structure 
of PA terms. In PVS, this requires a so-called measure function which should be well- 
founded and should decrease with every recursive call. In general, type checking in 
PVS need not be decidable; it might generate so-called Type Check Conditions (TCCs) 
which are proof obligations that have to be fulhlled for type correctness. For recursive 
definitions, TCCs concerning the correctness of the measure function are generated. 
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basic? (x) : RECURSIVE bool = 

CASES X OF 

atm (a) : TRUE , 

o(y, z) : atom?(y) AND basic? (z), 

+(y, z) : basic?(y) AND basic?(z), 

//(y, z) : FALSE, 

ImrgCy, z) : FALSE 

ENDCASES 
MEASURE X BY «; 

basic_terms : TYPE = {x I basic? (x)} 

The recursive definition above leads to several TCCs, including one requiring that the 
subterm relation << is well-founded. This follows immediately from the corresponding 
axiom mentioned in Section 2.3. The other TCCs, requiring that the recursive calls are 
applied to subterms of argument x, are trivial and can be proved automatically. 

Note that we have not defined basic terms as a separate abstract datatype; by defin- 
ing it as a predicate on datatype PA_terms, we obtain the desired subtype relation be- 
tween basic terms and process terms. This subtype relation is crucial for proofs based on 
the elimination theorem, as shown by the applications in the remainder of this section. 



3.3 Translating PA Terms to Basic Terms 



In theory PA2Basic we prove the elimination theorem. Note that the datatype PA_terms 
does not contain variables, which means that it specifies closed terms. We define a 
translation function pa2b which maps PA terms into basic terms. This definition is 
recursive and uses relations < and <= on terms that are presented below. 

PA2Basic [Atoms: NONEMPTY_TYPE] : THEORY 



pa2b(x) : RECURSIVE {b: basic_terms I b <= xp = 
CASES x OF 



atm (a) 
o(y, z) 



+(y. z) 
//(y, z) 
ImrgCy, z) 



atm (a) , 

CASES pa2b(y) OF 

atm(a) : atm(a) o pa2b(z), 
o(v, w) : VO pa2b(w o z) , 

+(v, w) : pa2b(v o z) + pa2b(w o z) 

ENDCASES , 

pa2b(y) + pa2b(z), 

pa2b(lmrg(y, z)) + pa2b(lmrg(z, y)), 

CASES pa2b(y) OF 

atm(a) : atm(a) o pa2b(z), 
o(v, w) : V o pa2b(w // z) , 

+(v, w) : pa2b(lmrg(v, z)) + pa2b(Imrg(w, z)) 
ENDCASES 



ENDCASES 

MEASURE X BY <; 



This definition generates 26 TCCs to show, for instance, that recursive calls are applied 
to terms that are smaller than argument x, according to the relation <, and to show that 
the result of the function is a basic term not greater than the argument, according to <=. 
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The main problem was to find definitions for the relations < and <= such that all 
TCCs could be proved. For instance, some obvious relations based on the length of 
terms (the number of symbols) are not correct, since we have to show, for example, that 
pa2b(lmrg(y, z)) + pa2b(lmrg(z, y)) <= y // z. 

The solution is based on a weight function on PA terms mentioned in [1]. It uses the 
exponentiation function expt, which is already available in PVS. 
weight (x): RECURSIVE {n: nat I n >= 2} = 

CASES X OF 

atm (a) : 2, 

o(y, z) : weight(y) * weight(z) + weight(y), 

+(y, z) : weight (y) + weight (z) + 1, 

//(y, z) : expt (2, weight (y) + weight (z) + 2), 

ImrgCy, z) : expt(2, weight(y) + weight(z)) 

ENDCASES 
MEASURE X BY «; 

<(x, y) : bool = weight (x) < weight (y) 

<=(x, y) : bool =x<yORx=y 

The elimination theorem, pa2b_eq, expresses that the result of translation pa2b is 
equivalent to its argument. The proof is rather tedious and uses induction over the struc- 
ture of argument x as provided by the induction mechanism generated by PA_terms. 
pa2b_eq: THEOREM pa2b(x) == x 
END PA2Basic 

Our proof of this elimination theorem is constructive in the sense that it provides a con- 
crete transformation from PA terms to basic terms. This in contrast with the literature 
on process algebra where the proofs usually rely on term-rewriting theory [2,3]. 

3.4 Using Elimination and Structural Induction on Basic Terms 

As a simple application of the elimination theorem, we define the alphabet of PA terms 
by means of three axioms that specify the alphabet of basic terms. Additionally, Axiom 
AB4 specifies that equivalent terms have the same alphabet. 

Alpha [Atoms : NONEMPTY_TYPE] : THEORY 
BEGIN 

IMPORTING PA2Basic [Atoms] 

btx, bty, btz : VAR basic_terms 

alpha : [PA_terms -> setof [Atoms] ] "/ alphabet 



ABl: 


AXIOM 


alpha ( atm ( a) ) 


= singleton(a) 


AB2: 


AXIOM 


alpha (atm (a) o x) 


= add(a, alpha(x) ) 


AB3: 


AXIOM 


alpha(x + y) 


= union(alpha(x) ,alpha(y)) 


AB4: 


AXIOM X 


== y IMPLIES alpha(x) 


= alpha(y) 



We show that this implies the expected property for the alphabet of a general sequential 
composition, as stated in theorem AB2pa. Using theorem pa2b_eq, it is sufficient to 
prove the property for basic terms, as expressed by lemma AB2b. 

AB2b : LEMMA alpha(btx o bty) = union (alpha (btx) , alpha (bty) ) 

AB2pa : THEOREM alpha(x o y) = union(alpha(x) , alpha(y) ) 

END Alpha 

Lemma AB2b has been proved by induction on btx; this gives induction on the struc- 
ture of PA_terms, but since basic? (btx) holds, the cases for non-basic terms can be 
discharged trivially. Hence, this boils down to induction on the structure of basic terms. 
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3.5 Using Elimination and Induction on the Length of Basic Terms 

General properties of the merge and the left merge are often useful in verifications. 
For process algebra PA these properties can be proved, in a model-independent way, 
by means of the elimination theorem. For a process algebra with recursion this is not 
always possible, and then they are introduced as axioms, called the axioms of standard 
concurrency [2,3]. In this section, we concentrate on associativity of the merge, called 
ASC6. The proof uses commutativity of the merge, called ASC2, and a property of the 
left merge, ASC4. The other axioms of standard concurrency deal with communication 
and are omitted here. Property ASC2 is proved easily using Ml and Al. 

PAsc [Atoms: NDNEMPTY_TYPE] : THEORY 
BEGIN 

IMPORTING PA2Basic [Atoms] 

ASC2 : THEOREM x // y == y // x 

To prove the other two properties, elimination theorem pa2b_eq is used to reduce these 
properties to basic terms, as expressed by ASC4b and ASC6b. By lemma ASC46b, these 
two lemmas are proved simultaneously by strong natural induction on the sum of the 
lengths of the basic terms. The proof of ASC46b also uses case analysis on the structure 
of basic terms, illustrating the importance of the reduction to basic terms. 
length(x) : RECURSIVE posnat = 

CASES X OF 

atm (a) : 1 , 

o(x,y) : length(x) + length (y) , 

... "/o similar for +, //, Imrg 

ENDCASES 
MEASURE X by « 

n : VAR nat 
ASC46b : LEMMA 

(FDRALL btx,bty,btz: n = length(btx) + length(bty) + length(btz) 

IMPLIES Imrg(Imrg(btx,bty) ,btz) == Imrg(btx,bty//btz) ) 

AND 

(FDRALL btx,bty,btz: n = length(btx) + length(bty) + length(btz) 
IMPLIES (btx//bty)//btz == btx//(bty//btz) ) 

ASC4b : LEMMA Imrg(Imrg(btx,bty) ,btz) == Imrg(btx,bty//btz) 

ASC6b : LEMMA (btx//bty)//btz == btx//(bty//btz) 

ASC4 : THEOREM Imrg(Imrg(x,y) ,z) == Imrg(x,y // z) 

ASC6 : THEOREM (x//y)//z == x//(y//z) 

END PAsc 

The proofs of theorems ASC4 and ASC6 use lemma pa2b_eq to replace x, y, and z by 
equivalent basic terms. The proofs are completed using symmetry, transitivity, and a 
few properties of == about substitution in a context. 

Observe that rewriting is cumbersome in the current approach because symmetry, 
transitivity, rewriting in contexts, etc., all have to be performed explicitly. Although 
this can be solved to some extent by defining a strategy in PVS that combines these 
commands, it would be more convenient if the user could define its own congruence 
relation, such as ==, and obtain the desired rewriting. The main conclusion of this sec- 
tion is that the PVS facilities for abstract datatypes and subtyping are useful to prove 
non-trivial theorems in process-algebra theory with a reasonable amount of effort. 
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4 Verifying Applications Using Equational Reasoning 

Since rewriting turned out to be tedious in the proofs of the previous section, we elab- 
orate in this section on the approach of Section 2.2 where terms are defined as an unin- 
terpreted type with axioms that specify equality on terms. In order to experiment with 
this approach on some more complicated applications, we axiomatize ACP’’*: Algebra 
of Communicating Processes (ACP) with abstraction [3] and binary Kleene star [4]. 
The formal framework is defined in Section 4. 1 and applied to the verification of an 
Alternating-Bit Protocol (ABP) in Section 4.2. This protocol often serves as a bench- 
mark for verifications in process algebra [3,5,1 1]. 



4.1 Defining ACP by Uninterpreted Terms and Equality 

Similar to PA, process algebra ACP’’* contains atoms and operators for sequential com- 
position, choice, merge and left merge. In addition, there are two special atoms S, in- 
dicating deadlock or unsuccessful termination, and r, representing the silent (internal) 
step. The merge in ACP’’* is slightly different; besides interleaving the atoms of the 
two processes, represented by the left merge, it is now also possible to have a syn- 
chronous communication, represented by a communication merge “ I ”. This communi- 
cation merge is defined by means of a communication function 7 which defines, for a 
particular application, the result of the communication for each pair of atoms. A result 
S indicates that the atoms cannot synchronize. The axioms of ACP’’* axiomatize rooted 
branching bisimulation, which means that processes with the same external behavior, 
but possibly different internal actions, are considered to be equal. This equivalence is 
particularly suitable to verify implementations versus specifications, as explained in the 
next subsection. 

Theory ACPtbks implements ACP’’* in PVS. It has a communication function as 
a parameter and contains explicit assumptions about its properties. If a theory imports 
ACPtbks with a particular function, TCCs are generated to show that the assumptions 
are fulfilled. 



ACPtbks [Atoms: NONEMPTY_TYPE , delta: Atoms, tau: Atoms, 
gamma: [Atoms, Atoms -> Atoms] ]: THEORY 



BEGIN 

ASSUMING 

Cl : ASSUMPTION gamma(a,b) 

C2 : ASSUMPTION gamma(gamma(a,b) , c) 

C3 : ASSUMPTION gamma(a, delta) 

C4 : ASSUMPTION gamma(a,tau) 

ENDASSUMING 



gamma (b, a) 

gamma (a, gamma (b, c) ) 

delta 

delta 



The definition of Terms, conversion trm, operators +, o, //, Imrg, and axioms A 1 
through A5 are exactly the same as in theory PArew of Section 2.2. 

New are axioms for delta and tau, the definition of the communication merge /, 
and a changed list of axioms for concurrency. Note that B1 and B2 express that tau is 
not observable and can be removed, provided all options present before executing the 
silent action are present after executing it. Not shown are CM2 - CM4, which are equal 
to M2 - M4, some axioms for /, and the axioms of standard concurrency [2,3]. 
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A6 : AXIOM x + delta = x 

A7 : AXIOM delta o x = delta 

B1 : AXIOM x o tau = x 

B2 : AXIOM x o (tau o (y + z) + y) = x o (y + z) 

/: [Terms , Terms -> Terms] 

CF : AXIOM a / b = gamma(a,b) 

CMl : AXIOM x // y = lmrg(x,y) + lmrg(y,x) + (x / y) 

CM9 : AXIOM x / (y + z) = (x / y) + (x / z) 

The encapsulation operator enc maps atoms of a set H to delta. It can be used to 
enforce that certain atoms communicate; they cannot occur in isolation. Not shown 
here are similar axioms that specify the abstraction operator abs which hides (internal) 
atoms of a set by mapping them to the silent action tau. 



enc , 


abs : 


[setof [Atoms] , 


Terms -> Terms] 




H 


VAR setof [Atoms] 






D1 


AXIOM 


NOT member (a, H) 


IMPLIES enc(H,a) = a 




D2 


AXIOM 


member (a, H) 


IMPLIES enc(H,a) = delta 




D3 


AXIOM 




enc(H, X + y) = enc(H,x) 


+ enc(H,y) 


D4 


AXIOM 




enc(H, X 0 y) = enc(H,x) 


o enc(H,y) 



The binary Kleene star represents an iteration; x * y denotes the process that can re- 
peatedly behave as the body x, but it can non-deterministically stop the repetition and 
decide to behave as y. We only show the axioms that are needed in the next section. 

* : [Terms, Terms -> Terms] 

BKSl : AXIOM x*y= (xo (x*y)) +y 

BKS4 : AXIOM enc(H, x * y) = enc(H,x) * enc(H,y) 

The Fair Iteration Rule, FIR, excludes an infinite sequence of tau atoms if there is an 
alternative. The Recursive Specification Principle for the binary Kleene star, RSPbks, 
specifies the solution of a particular form of guarded recursive equations. A term x is 
guarded, denoted guard? (x) , if it cannot terminate successfully without performing at 
least one visible action. 

FIR : AXIOM tau * x = x + (tau o x) 

RSPbks : AXIOM guard? (y) AND x = (y o x) + z IMPLIES x = y * z 

END ACPtbks 

4.2 Verification of an Alternating-Bit Protocol 

To experiment with the framework of the previous subsection, we consider a version of 
the ABP with iteration and fairness. The verification of this protocol follows a standard 
approach which is the basis for any ACP-style verification; after the introduction of 
a few basic primitives, first the required service is specified. Next the implementation 
of the protocol is specified and we show that, after encapsulation and abstraction, it is 
equivalent to the specification. 

For the ABP, we need message passing with bits. Therefore, atoms are structured 
as an abstract datatype. Besides delta and tau, we have input, output, send, receive, 
and communication atoms. Input and output atoms represent the communication with 
the environment of the protocol, i.e., they represent its external interface, whereas send 
and receive are internal atoms that synchronize to a communication atom. 
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ABP_Atoms [Messages : TYPE, Bits : TYPE] : DATATYPE 



BEGIN delta : delta? 

tau : tau? 

inpCim: Messages) : imsg? 

outpCom: Messages) : omsg? 

sendCsm: Messages, sb: Bits) : send? 

rec(rm: Messages, rb: Bits) : rec? 

coiranCcm: Messages, cb: Bits) : coiran? 

END ABP_Atoms 



This general structure is used in theory ABP with simple messages representing data d 
and acknowledgments a. Bits are represented by t (true) and f (false). Alternation of 
bits is dehned by function alt. The hand-shake communication mechanism is defined 
by function gamma. It expresses that a send and a receive should be combined into a 
communication. Observe that importing ACPtbks leads to four TCCs corresponding to 
the assumptions on gamma. 

ABP : THEORY 
BEGIN 

Messages : TYPE = {d,a}- 
Bits : TYPE = {t,f> 

IMPORTING ABP_Atoms [Messages , Bits] 
m, mO : VAR Messages 

b, bO : VAR Bits 

e, f, g : VAR ABP_Atoms 

alt(b) : Bits = CASES b OF t : f , f : t ENDCASES 

gaimna(e,f) : ABP_Atoms = 

CASES e OF 

send(m,b): CASES f OF 

rec(mO,bO): IF mO=m AND bO=b THEN coimn(m,b) ELSE delta ENDIF 
ELSE delta ENDCASES, 
rec(m,b) : . . . "/, similarly 

ELSE delta 
ENDCASES 

IMPORTING ACPtbks [ABP_Atoms, delta, tau, gamma] 

The aim is to verify an ABP according to specification ABP_spec which expresses that 
it should behave as a one-place buffer, copying data on its input port to its output port. 
Note that x*delta denotes a non-terminating iteration, repeating body x forever (see 
axioms BKSl and A6). 

ABP_spec: Terms = (inp(d) o outp(d) ) *delta 

This specification is implemented by means of a sender and a receiver. For simplicity, 
we do not model the communication channels between them, but assume they commu- 
nicate directly; channel failures are modeled in the behavior of the receiver. 

The sender S alternates between S(t) and S (f ) , where S(b) gets a data item, sends 
it with bit b, and next repeatedly receives an erroneous acknowledgment (expressed by 
SE (b) ) until it gets a correct one (expressed by SN (b) ). 

SE(b) : Terms = rec(a, alt (b) ) o send(d,b) "/, error part sender 

SN(b) : Terms = rec(a,b) "/, normal part sender 

S(b) : Terms = inp(d) o send(d,b) o ( SE(b) * SN(b) ) 

S : Terms = (S(t) o S(f))*delta 
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The receiver has a similar structure; its error part, denoted by term RE(b), models in 
an abstract way the possibility that messages might be corrupted by the channel and the 
receiver sends an acknowledgment with the wrong bit. Note that this error part can be 
repeated indefinitely. However, assuming fairness, only a finite number of subsequent 
errors can occur. 

RE(b) : Terms = send(a, alt (b) ) o rec(d,b) "/, error part receiver 

RN(b) : Terms = outp(d) o send(a,b) "/, normal part receiver 

R(b) : Terms = rec(d,b) o ( RE(b) * RN(b) ) 

R : Terms = (R(t) o R(f))*delta 

Further, we define a set H which is used to encapsulate isolated send and receive atoms, 
and a set I which is used to abstract from communication events. 

H: setof [ABP_Atoms] = { e I send?(e) DR rec?(e) } 

I: setof [ABP_Atoms] = { e I comm?(e) } 

Then, the aim is to prove that the parallel composition of the sender and the receiver, 
S // R, encapsulating the send and receive atoms and abstracting from the communi- 
cation atoms, equals the specification of the Alternating-Bit Protocol; 

abs(I,enc(H,S // R) ) = ABP_spec. 

First, we show that enc(H,S//R) = X, where X is an auxiliary term defined by: 

XE(b) : Terms = comm(a, alt (b) ) o coiran(d,b) "/, error part protocol 

XN(b) : Terms = outp(d) o coimn(a,b) "/, normal part protocol 

X(b) : Terms = inp(d) o comm(d,b) o ( XE(b) * XN(b) ) 

X : Terms = (X(t) o X(f))*delta 

We start with unfolding the iterations inside S // R using BKSl, which makes the 
choice between the normal and error parts in the body of each component explicit. 
Next, we prove that the body of the protocol equals the body of X. This is far from 
trivial, but the proof in PVS is rather straightforward. First, we install a large number 
of axioms and some useful lemmas as automatic rewrite rules (from left to right). This 
includes A2, A5, A6, A7, CF, CM2 through CM9, D3, D4, and BKS4. Then, we repeatedly 
rewrite explicitly using CMl, expand the definition of gcunma and apply the automatic 
rewrites. This proof shows the main advantage of using equality over a user-defined 
congruence; substitution in contexts, transitivity, etc., are all implicitly incorporated in 
the rewriting mechanism of PVS. Using a similar rewriting, we can then prove that one 
iteration of S in parallel with R corresponds to one iteration of X. 

once_rep : LEMMA enc(H,S // R) = (X(t) o X(f)) o enc(H,S // R) 

Recursion axiom RSPbks (with delta instead of z) and Axiom A6 then lead to 
cbhv : LEMMA enc(H,S // R) = X 

Finally, using the properties of abstraction and fairness principle FIR, we obtain 
ABP_eq_spec : THEOREM abs(I ,enc(H,S // R)) = ABP_spec 
END ABP 

Comparing our proof using PVS with a manual proof, one can observe that the main 
proof steps are the same. In a manual proof, however, usually not all intermediate steps 
are written down, whereas a tool such as PVS requires a detailed check of all steps. 
Fortunately, these tedious steps can be automated to a large extent, using the powerful 
rewrite capabilities of PVS. This leads to a higher degree of automation than a related 
verification in the proof checker Coq [5]. The authors of [5] explicitly mention that 
rewriting is not so easy in Coq. 
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5 Concluding Remarks 



Two approaches have been presented to formulate ACP-like process algebras [2,3] in 
the language of PVS. Each approach has been validated by applying it to non- trivial 
examples. 

Process terms as an uninterpreted type. Equality on terms is specified by means 
of axioms that can be used as automatic rewrite rules by the PVS proof checker. In this 
framework, we have formalized an algebra of communicating processes with abstrac- 
tion and the binary Kleene star as a limited form of recursion. A disadvantage of this 
approach is the lack of induction principles, which are essential for proofs of theoretical 
results about process algebra. 

Process terms as abstract datatype. In this approach, an additional equivalence 
relation on terms is introduced and axiomatized. By using the abstract-datatype mecha- 
nism of PVS, we obtain convenient induction principles. A disadvantage of this frame- 
work is that substitution in contexts has to be formalized explicitly and rewriting by 
means of the equivalence relation is inconvenient. 

Conclusion. The main conclusion is that mechanical support for process algebra by 
means of PVS is feasible, both for theory development and for concrete applications. 
New in our paper is that we have obtained suitable tool support for the proof of theoreti- 
cal properties of ACP-style algebras. We have proved an elimination theorem which can 
be used to prove a property of an algebra by reducing it to a property formulated in ba- 
sic terms. Besides its use for applications such as verified in this paper, the elimination 
theorem also plays a role in completeness proofs for specific models [2,3]. 

Unfortunately, the ideal framework for theory development differs from the ideal 
framework for concrete applications. It would be a major improvement if the two ap- 
proaches can be combined, allowing inductive proofs and convenient term rewriting. 
Ideally, this could be achieved by extending the PVS system with the facility to per- 
form rewriting on user-defined congruence relations. An alternative is to define power- 
ful proof strategies that incorporate general rewrite patterns for congruences. 

As mentioned in the introduction, essentially the same approaches as the ones we 
studied here are investigated in [9] where a CSP-like process algebra is embedded in 
HOE. The conclusions of [9] about rewriting and equational reasoning are similar to 
ours. As a result, the authors express a slight preference for the approach with unin- 
terpreted types. However, in [9] only small concrete examples have been studied and 
no theoretical results have been derived. Our work shows that when one is interested 
in theory development for ACP-style process algebras, the approach based on abstract 
datatypes is the only one feasible. Also note that we heavily use the subtyping mecha- 
nism of PVS (to define basic terms) and dependent types, features which are not sup- 
ported by the HOE system. 

An advantage of the use of a general purpose verification tool such as PVS, above a 
dedicated process-algebra tool, is the possibility to get insight in the desired tool support 
for various fields of application and different process algebras in a short amount of time. 
Using the large number of predefined theories and libraries, it is easy to study extensions 
and variations of the framework. As an alternative to PVS, it would be interesting to 
experiment with the generic theorem prover Isabelle [17], since it allows rewriting with 
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user-defined congruence relations and ordered rewriting (allowing, e.g., rewriting using 
a commutativity axiom). 

Acknowledgments. We would like to thank Jaco van de Pol and Jos Baeten for their 
comments on a draft version of this paper. 
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for Bisimulation Verification 
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Abstract. We advocate the use of the up-to techniques for bisimula- 
tion in the field of automatic verification. To this end, we develop a 
tool to perform proofs using the up to structural congruence, the up 
to restrictions and the up to parallel composition proof techniques for 
bisimulation between 7r-calculus terms. The latter technique is of par- 
ticular interest because it allows one to reason on infinite state space 
processes. To use it in full effect, we adapt the “on the fly” bisimulation 
checking algorithm, leading to a form of computational completeness. 
The usefulness of these techniques in dealing with the expressive power 
of the TT-calculus is illustrated on two non trivial examples, namely the 
treatment of persistent data structures and the alternating bit protocol. 
These examples are also good opportunities to study how well-known 
TT-calculus encodings behave in the framework of automatic verification. 



Introduction 

This paper studies the applicability of the so-called up-to techniques for bisim- 
ulation in the field of verification. Bisimilarity, and its correlated proof method 
bisimulation, have become popular notions of equivalence used to reason about 
concurrent systems. While having a simple and clear mathematical definition, 
they are far from being straightforward to handle in the framework of automatic 
verification, due to their richness and to the many subtle phenomena they are 
able to catch in the study of concurrency. The up-to techniques for bisimulation 
are presented in [San95]; they can simplify bisimulation proofs by reducing the 
size of the relations one has to consider. More precisely, an up-to technique is 
represented by a function T from relations to relations, such that proving that 
processes related in a relation TZ evolve to processes related in IF(T^) is enough 
to show that TZ is contained in bisimilarity. One can thus consider relations that 
are smaller than bisimulations, the gap being filled by the application of T . 

In this paper, we concentrate on bisimulation between 7r-calculus terms; the 
TT-calculus [Mil91] has become a widely accepted algebra for modelling concur- 
rency, and has demonstrated a great expressive power. It is our belief that, to 
use TT-calculus as a specification language in the field of verification, one should 
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be able to cope with the richness of this formalism, in particular by design- 
ing specific verification techniques for it, and not only transpose methods that 
are applied to less expressive formalisms. The Mobility WorkBench [VM94] is 
a good example of a verification tool developped specifically for the 7r-calculus 
(and more recently for its evolution fusion calculus), taking advantage in partic- 
ular of the nice characterisations given by open bisimulation [San96] . With these 
ideas in mind, the up-to methods seem to be good candidates for this task: the 
bisimulation up to bisimilarity proof technique, and more importantly the up 
to restrictions and the up to parallel composition proof techniques, have been 
developped in the theoretical study of 7r-calculus, and typically allow one to deal 
with name extrusion (up to restrictions proof technique) and with replication 
(up to parallel composition proof technique) . 

Feasability of the automation of the up-to techniques has been adressed in 
[Hir98], where some methods to decide if a pair of processes belongs to a relation 
up to structural congruence, up to restrictions, and up to parallel composition, 
are introduced. These methods lead to the definition of a general verification 
algorithm for up-to bisimulation between 7r-calculus terms, embedded in a pro- 
totype tool. In the present work, we extend the capabilities of the system, and, 
more importantly, try to evaluate the outcome of such an effort through the 
study of two non trivial examples that exploit the 7r-calculus’ expressive power. 
With respect to [Hir98], the current version of the system is provided with a 
richer notion of structural congruence, and, moreover, the general bisimulation 
verification algorithm is modified in order to take into account the way the up 
to parallel composition proof technique works. 

The plan of the paper is as follows: in Section 1, we introduce the formal 
background on 7r-calculus and the up-to techniques for bisimulation. Section 2 is 
devoted to the description of our implementation, and of the general bisimulation 
algorithm we use to handle the up to parallel composition proof technique in a 
computationally complete fashion. We then describe two case studies: persistent 
data structures, as introduced by Milner [Mil91], are examined in Section 3, 
while Section 4 is devoted to the study of the alternating bit protocol, in an 
encoding adapted from [Mam98]. We finally conclude by discussing future work. 

1 Preliminaries 

In this Section, we introduce the syntax and semantics of the language we use, 
a restricted though expressive subset of polyadic 7r-calculus where replication is 
allowed only on prefixed processes. We then define bisimulation, together with 
the up-to techniques for bisimulation. 

Definition 1 (Syntax) Given an infinite countable set of names Af, ranged 
over by a,b,c, . . . ,p,q, . . . ,x,y, . . . , we range over (possibly empty) name lists 
with a,b, . . . , and we define prefixes, ranged over by a, (3, as follows: 

a ‘’'= a{b) I a[b ] ; 
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Processes, ranged over with P,Q, R, . . . , are then defined as follows: 

P 0 I a.P I {j^x)P I P1IP2 I la.P. 

Prefixed processes are either receptions (prefix a{b)) or emissions (a[6]); 0 
is the inactive process; the restriction operator v makes a name private to the 
restricted process; parallel composition is written |, and ! stands for replication, 
allowed only on prefixed processes; intuitively, a replicated process represents 
any number of copies of this process put in parallel. 

Conventions and notations: In prefixes a{b) and a[b], a and b are called 
respectively the subject and the object parts of the prefix. We shall omit the 
object part of a prefix when it is empty, and use a monadic notation for single 
name object parts (thus writing e.g. a{b) and db). We shall as well omit the 
continuation of a prefix when it is the inactive process 0. Free and bound names 
are defined by saying that restriction and abstraction (embodied in the input 
prefix) are binding operators. As usual, we shall silently use a-conversion to 
avoid capture of bound names. 

Definition 2 (Transition system) The operational semantics of TT-calculus 
terms is defined as a labeled transition system. Actions, ranged over with 
are given by the following syntax: 

H 11/ a{b) I (i/b')a[ 6 ]b'cb I 

which reads as follows: an action is either a reception, a (possibly bound) output, 
or the silent action r, denoting internal communication. Bound and free names 
of actions are defined as usual. 

The judgment P P' , meaning that process P is liable to perform action pL 
to become P', is defined by the rules of Figure 1 (symmetrical versions of rules 
PAR; and CLOSEi are omitted; :: is the constructor used to add an element to a 
list). Note that we adopt an “early” version of the operational semantics. 

The notion of semantical equivalence on processes we are interested in is 
bisimilarity, defined as follows: 

Definition 3 (Bisimulation, bisimilarity) A relation TZ over processes is a 
bisimulation ijf, whenever PTZQ and P P' , there exists a process Q' s.t. 
Q Q' and P'TZQ', and conversely for the transitions of Q. Bisimilarity, 
written ~, is the greatest bisimulation. 



Definition 4 (Structural congruence) Structural congruence, written =, is 
the smallest equivalence relation on ir-calculus terms that is a congruence gen- 
erated by the following rules: 



P|0 = P P\Q = Q\P P\{Q\R) = {P\Q)\R {i^x)0 = 0 

\a.P\a.P =\a.P !a.P|!a.P =\a.P {vx)P \ Q = {vx) (P\Q) if x ^ fn{Q) . 
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(iNP) a{b).P 



l(c) 



(out) a[6].P 



i[b] 



I (bang) ^ — - (res) 

^ \a.P \a.P\P' 



(vx) P (vx) P 



- x^ n{fi) 



P|g p'\Q ^ 

n (i'b')a[b] 

(open) — tj^a,tGb\b' 



(CLOSEi 



(ut)P 

'V — — b' n fn(P) = 0 
P|Q ^ (u6') (^^ IQ ) 



Fig. 1. Operational semantics 



Note that the rule for processes of the form !a.P|!a.P is new with respect 
to the usual definition of structural congruence, and defines a straightforward 
extension of this relation. Note as well that = is included in 

We shall actually use a slightly enriched version of structural congruence, 
where we add the following law: 

(x occurs in a in subject position) (vx) a.P = 0 . 

This rule is usually not included in the definition of structural congruence, be- 
cause its signification is too “semantical”, i.e. it expresses a behavioural property 
of a process rather than a geometrical one, as structural congruence is intended 
to do. However, in the framework of the up to bisimilarity proof technique, it 
is preferable to equate as many terms as possible, hence an extended version of 
structural congruence is of interest; moreover, this particular law will turn out 
to be very useful in the example treated in Section 4. We extend the notation = 
to our “enriched” structural congruence relation. 

To introduce up-to bisimulation (see [San95]), we consider functions from re- 
lations to relations, ranged over by JF, and modify the definition of bisimulation: 

Definition 5 (Up-to bisimulation) Given a function T from relations to re- 
lations, we say that a relation TZ is a bisimulation up to T iff the property of 
Definition 3 holds when we replace “P' TZ Q' ” by ‘P' T(TZ) Q'”. 

Intuitively, an up-to bisimulation relation TZ is “smaller” than a bisimulation, 
the gap being filled by function P, that helps building the “future” of processes 
related in TZ. This can be useful, as it allows us to reduce the size of the relations 
we handle for the task of proving bisimilarity results. This is possible when we 
are dealing with a correct function P: a function P is said to be correct when 
(TZ is a bisimulation up to P) implies (TZ C~). In the context of 7r-calculus, we 
shall use the up-to techniques given by the following Proposition: 
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Proposition 6 ([San95]) The up to structural congruence, up to restrictions 
and up to parallel composition proof techniques for iT-calculus terms are defined 
by the following functions over relations, respectively called T\, and T^,'- 

^l(7^) {(P, Q); 3Po, Qo s.t. P = Po, PoUQo and Qo = Q}; 

p 2 {P) ‘^= {(P, Q); 3Pq, Qo, X s.t. P = {vx) Po, Q = {vx) Qo and PqTZQo}; 

•^3(7^) {(P,Q); 3Po,Qo,T s.t. P = Po\T, Q = Qo\T and PoTZQo}. 

Functions T\, and To define correct up-to proof techniques, i.e. proving that 
TZ is a bisimulation up to T (where T is one of the aforementioned functions) is 
sufficient to prove that TZ relates bisimilar processes. Moreover, these functions 
can be combined together, still yielding a correct technique. 

Note that T\ corresponds to a restriction of the general bisimulation up to 
bisimilarity technique (where ~ would replace = in the definition of Pi). 

2 Automatising Up— To Bisimulation 

We discuss here the questions related to the automation of the up-to proof 
techniques for bisimulation. As stated before, the improvements with respect to 
[Hir98] are twofold: firstly, a narrower relation is handled in the bisimulation 
up to bisimilarity proof technique, and secondly, the general up-to bisimulation 
checking algorithm is modified in order to perform breadth-first search (and to 
handle expansion as well: see Section 4). 

2.1 Deciding Up To Closure 

The cornerstone of our method is the ability to decide whether a pair of pro- 
cesses belongs to the closure of a relation up to some correct proof technique; in 
the sequel, we concentrate on our most powerful proof technique, namely the up 
to (extended) structural congruence, up to restrictions and up to parallel com- 
position proof technique, embedded by function T\ 0 T 2 0 P 3 (see Proposition 6 
above), where o denotes function composition. 

Proposition 7 ([Hir98]) Given a pair {P,Q) of processes and a relation TZ, 
we can decide whether {P,Q) & T\ 0 T 2 o To{TZ). 

The proof given in [Hir98] is straightforwardly adapted to handle extended 
structural congruence as defined above. Let us now explain how we exploit this 
result for up-to bisimulation verification. 

2.2 The General Checking Method 

As we front the task of bisimulation verification, an immediate though deter- 
mining remark is that functions T\, T 2 and To are syntactical operators; this 
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compells us to use an “on the fly” algorithm for bisimulation checking, in order to 
have access to the actual term corresponding to a given state in the verification 
process (as opposed for example to the partition refinement algorithm [PT87], 
which works on unfoldings of processes). The “classical” on the fly checking 
method [FM91] can easily be adapted to the up-to methods, so as to define a 
semi-decision procedure for up-to bisimulation verification in the 7r-calculus, as 
is done in [Hir98]. However, to exploit the up-to methods in full effect, we need 
to modify the general bisimulation verification algorithm; this is the subject of 
the remainder of this Section. 

Indeed, one of the crucial improvements brought by the up-to techniques is 
the ability, through the up to parallel composition proof technique, to handle 
in some cases replicated terms, i.e. terms possibly having an infinite behaviour. 
Thus, when applied, this proof method can allow one to cut infinite branches in 
the state space of the processes being compared. Using depth-first search, as the 
original algorithm of [FM91] does, it can be the case that we miss two matching 
branches which can be cut using the up to parallel composition technique, and 
instead enter an infinite loop (corresponding to an infinite growth of the state 
space). To avoid this, it is preferable to adopt a breadth-first strategy, in order 
to enter an infinite loop only when compulsory. Figure 2 presents the overall 
behaviour of Mounier’s original “on the fly” algorithm, that progressively builds 
a candidate bisimulation relation, by exploring the states which can be reached 
starting from two processes. 



W:=0; 

(*) TZ={{P,Q)}, V:= 0, R:= 0, status:=true; insert (P,Q) in S; 
while S is not empty and status = true do 

choose a pair (Pq, Qo) in S and remove it from S; 

if (Po,Qo) succeeds then (add (Po,Qo) to V (or R, see below); propagate) else 
if (Po,Qo) fails then 

if (Po,Qo) e R 

then (remove (Pq,Qo) from V and insert it in W, status:=f alse; propagate) 
else (insert {Pq,Qo) in W; propagate); 
else (* neither suceeeds nor fails *) 

compute the successors of (Po, Qo) and insert them in S; 
endwhile; 

if P ~ Q then (if status then true else loop back to (*) ) else false 

Fig. 2. General “On The Fly” Bisimulation Checking Algorithm 



Let us explain informally how the verification method works (a more detailed 
description can be found in [FM91] and in the author’s forthcoming PhD the- 
sis). The algorithm explores the state space given by the cartesian product of the 
transition systems induced by the two processes being checked. The data struc- 
tures involved in the algorithm are: a structure § (we shall return later on to the 
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nature of §), that contains the states that still have to be examined; three sets 
V, W, and R, containing the pairs of processes that are respectively supposed to 
be bisimilar, known to be non-bisimilar, and known to be bisimilar. When two 
new processes turn out to be bisimilar, if the corresponding pair is in V, it is 
stored in R, and symmetrically for the case of non-bisimilarity (from V to W). 
It can be the case, though, that during the exploration of the state space, an 
incorrect hypothesis is assumed regarding the bisimilarity of two processes (i.e. a 
pair in R is actually made of non-bisimilar processes): to handle this, a boolean 
flag called status records the “reliability” of the current computation. When a 
computation leads to an unreliable result, we start all over again, keeping track 
in W of the states that have been proved non bisimilar. 

We still have to explain the behaviour of functions succeeds, fails, and 
propagate: function fails tests if a failure can be detected by inspecting the 
immediate transitions of the two processes to be examined (a failure occurs if one 
process performs an action that cannot be mimicked by the other term), or if the 
two processes are known to be non-bisimilar (set W). Symmetrically, succeeds 
checks if the two processes are trivially bisimilar (i.e. if they are a-convertible or 
both have no transition) or if there is an assumption stating bisimilarity between 
them (sets V and R). Finally, function propagate propagates a newly found 
information along the state space, possibly discovering new (non-)bisimilarity 
properties, or cutting out branches that do not have to be explored anymore. 

As expected, the nature of the structure § determines the exploring strategy: 
with respect to Mounter’s algorithm, we replace the stack of states that have 
to be examined by a queue, thus performing breadth-first instead of depth-first 
search. Some extra information regarding the representation of the “vertical” 
structure of the state space (i.e. the relationships between states due to the 
transition relation), that comes for free in the case of a stack structure, has to 
be provided within the objects that are stored in Using breadth-first search, 
we get a computationally complete checking algorithm: 

Theorem 8 We say that a bisimulation checking algorithm is computationally 
complete with respect to a given up-to technique T whenever it diverges if and 
only if no finite bisimulation up to T , relating the two processes to be checked 
and derivatives of these processes, exists. 

With this terminology, the breadth-first version of the algorithm of Figure 
2 is computationally complete with respect to the function T\ o o Tz (see 
Proposition 6). 

The proof of this result follows immediately from the very definition of the 
breadth-first search. Note that the hypothesis about the relation containing only 
derivatives of the processes we examine is determinant for the proof. What is 
really important here is the form of completeness we get: while for finite-state 
processes, depth-first and breadth-first searches differ only “strategically”, in 

^ This structural information is even more intricate in the weak case, where some care 
has to be provided to avoid r-loops before an action actually hres (i.e. in the “first 
=> part” of =>dh see Section 4). 
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the case of replicated terms, the breadth-first version, in conjunction with the 
up to parallel composition proof technique, provides a real gain in expressiveness, 
as expressed by Theorem 8. This improvement of the system, together with the 
additional structural congruence law seen above, will be determinant for the 
treatment of the examples in Sections 3 and 4. 



2.3 A Tool for Up To Bisimulation Verification 

The methods presented so far are implemented in a tool, written in O’Caml, 
and running under Unix. The system allows the user to define a pair of pro- 
cesses and to check whether they are bisimilar by choosing between three up- 
to proof techniques and between strong bisimulation and expansion (see below 
for the definition of this notion). Facilities for interactively simulating the be- 
haviour of a process and for debugging (in the case of a check failure) are also 
provided. Internally, the system works systematically up to structural congru- 
ence, i.e. with normalised terms, as defined in [Hir98]. The tool is available at 
http ; / / cermics . enpc . f r/~dh/pi. 



3 First Case Study: Persistent Values and Sharing 

One of the applications of the up to parallel composition proof technique is the 
proof of the so-called replication theorems, which express some properties of pro- 
cesses that model resources. Intuitively, a term of the form {va) {la{b).P \ Q) is 
viewed as an agent Q having access to a private resource P, located at channel 
a, with the possibility of instantiating the resource with some parameters b. The 
resource is usually replicated, because it is meant to be “always available” . Re- 
source processes of the form la{b).P are ubiquitous in the study of 7r-calculus: 
they can be found in particular in the encodings of the A-calculus, where ap- 
plication is represented as (the translation of) the function having access to its 
argument as a resource. Resource processes arise also in the study of higher- 
order TT-calculus, of object calculi, and of data structures. We focus here on the 
latter subject, by studying Milner’s encoding of lists in the 7r-calculus. 

In [Mil9I], lists are represented using two kinds of 7r-calculus terms, cor- 
responding to each constructor: a process of the form l(c,n).c[v;V] represents 
a Cons node, situated at I, and containing a value that can be accessed at 
channel v and a reference to the remainder of the list situated at F. The term 
{vV ,v) {l{c,n).c[v, V] I v.V I V .L) is thus seen as the list Cons(V, L), where V is 
the value located at v and L is the tail of the list, located at I'. A process of the 
form l{c,n).n represents the empty list Nil situated at 1. Lists can be interro- 
gated by sending two names at their location, one for each possible constructor. 

Such data structures, however, are linear: reading them destroys them; to 
make them persistent, one uses the replication operator. As said in [Mil91], there 
are a priori two ways to achieve this, whether one chooses to replicate the nodes 
and the value cells in the lists, or directly the subcomponents. Accordingly, list 
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Cons{V, L) seen above would become 

{I'l' ,v) {ll{c,n).c[v;l'] \ \v.V | ...) or \l{c,n).{vl' ,v) \ \v.V \ ...) 

respectively (the dots stand here for the continuation of the lists, whose 

shape depends on the nature of the topmost constructor of L, Cons or Nil: what 
is relevant here is the expression of a Cons node in a persistent list). As we shall 
see, this design choice gets reflected both on the behaviour of the corresponding 
data structure and on questions related to bisimulation checking. 

To illustrate this, we shall work in the following with a very simple list, 
consisting only in the element true^; its representation as an ephemeral data 
structure, given by i?o = {lQ{c.n)c\b]l\\ \ !6true | ?i(c, n).n), is simplified 

d£ f 

into R = (nb) {lo{c).cb \ !5true). This process can be viewed as a single constant 
cell holding the value true, and is only reminiscent of the list [true]; however, 
it will be sufficent for our task, and the reasoning made below holds for “real” 
lists as well. As said above, there are two ways to transform R into a persistent 
structure, represented by the following two processes: 

Ri (nb) {llo{c) .cb I !6true) and R 2 \Iq{c) .{ vb) {cb \ !6true). 

As stated in [Mil91], the difference between i?i and R 2 is reflected on sharing 
properties of our data structures. The problem of value sharing can be expressed 
in terms of security: suppose we have an agent A, willing to interrogate the 
resource R located at Iq any number of times, and to send a signal on either one 
of two channels o and p, depending on the value read at Iq. This agent can be 
represented by the process 

A UoCi.ci{b).{nt, /) (b[t, /] | t.o \ f.p ) , 

where c\ is a name that is specific to A for its communications with R. Suppose 
now that an evil agent E is willing to interfer with the communications of A, 
by interrogating R and trying to send the value false as if this was the value of 
the cell located at Iq. To do this, E needs to interrogate once R, to get name b 
where the value is located, and then “mimick” false along b. Hence: 



if = loC2-C2{b)-b{t,f)f. 

Our whole system is made of the parallel composition of the resource (being 
either Ri or R 2 ), “innocent” agent A, and “evil” agent E: 

SySj (nlo, Cl, C 2 ) ( i?i I A I A ) z = 1, 2 . 

^ Along the lines of the encoding of booleans in the A-calculus, the value true located 
at b, written 6 true, is a process waiting for two names t and /, and returning a signal 

def — 

on t (false would return a signal on /): we write &true = b{t,f)-t (see [Mil91]). 
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We verify that, depending on which representation we take for the persistent 
cell R, E can either disturb the communications performed by A, or be innocu- 
ous. To this end, we compare the behaviour of the term SySj for z = 1,2 with 
the behaviour of the process where each agent A and E have their own private 
copy of the resource (and hence cannot interfer): 

Sys' (Wo, Cl) {R, I A) I (Wo, C 2 ) {R^ \ E) i = l,2. 

Indeed, if we run the tool with these definitions, we get the following results: 

— case SyS]^: in this case, a location b for the value true is created once for all 
and transmitted to every process that interrogates the list at Iq- This way, 
even in the case where A performs only one request at Iq, agent E can send 
the wrong value false to A, and SySi and Sys) are not bisimilar. We check 
this on a simple session of our tool; we first define the two components of 
the pair of processes to be checked: 

> Left (“10 , cl , c2) ( ( (~b) ( ! 10(c) . c [b] I b(t ; f ) . t [] ) ) 

I 10[cl] .cl(b) .b(t;f) .f [] I 10[c2] .c2(b) . (“t,f) (b[t;f] |t.o[] If .p[])) 

> Right (“10,cl) (((“b) (!10(c) .c[b] |b(t;f) .t[])) 

I 10[cl].cl(b).b(t;f).f[]) I (-10,c2)(((-b)(!10(c).c[b] |b(t;f).t[])) 

I 10 [c2] .c2(b) . (“t,f) (b[t;f] |t.o[] If .p[])) 

The syntax for processes is quite natural: “ stands for abstraction, [] for 
output and 0 for input. We now check for bisimilarity: 

> Check 

The processes are not bisimilar. 

— case Sys 2 : in this case, a new name b is created each time the list is interro- 
gated at lo, and hence processes A and E cannot interfer. Even in the case 
where an infinite number of processes interrogate the list, we can prove that 
Sys 2 and Sys) are bisimilar: 

> Left (“10) (((! 10(c) . (“b) c [b] .b(t ;f) .t []) ) I 10 [cl] . cl (b) .b(t ; f ) .f [] 

I !10[c2] .c2(b).(“t)(“f)(b[t;f] |t.o[] |f.p[])) 

> Right (“10) (((! 10(c) . (“b)c [b] .b(t ;f) .t [] )) 

I 10[cl].cl(b).b(t;f).f[]) I (“10)(((!10(c).(“b)c[b].b(t;f).t[])) 

I !10[c2] .c2(b).(“t)(“f)(b[t;f] |t.o[] |f.p[])) 

> Check 

The processes are bisimilar. 

When the tool performs this verification, it uses the up to parallel compo- 
sition proof technique to erase copies of the cell each time they appear in 
both terms: an infinite proof is thus replaced by a finite one. 

The latter example exploits the up to parallel composition proof technique to 
handle the representation of data structures in the ir-calculus. In this context, 
our tool has been useful in dealing with non trivial properties of processes, and it 




On the Benefits of Using the Up-To Techniques for Bisimulation Verification 



295 



should be stressed that other existing systems cannot handle replicated processes 
like those we manipulate here. 

Furthermore, it should be remarked that security properties like the one 
we have considered can be adressed using type systems for the 7r-calculus. As 
exposed for example in [San97], an easy way to prevent E from “pretending” to 
be R by locating the value false at b is to forbid processes that interact with R 
to use b in input subject position (as E does). 

4 Second Case Study: the Alternating Bit Protocol 

We now focus on another example: the well-known alternating bit protocol, which 
is probably the most widely used benchmark for verification systems in the field 
of concurrency. Our purpose here is not to study it per se, but instead to use it 
to shed light on issues related to the behaviour of encodings in the 7r-calculus. 
To this end, we shall need expansion between processes: 

Definition 9 (Expansion) We write for the reflexive transitive closure of 
for and for if p, ^ t, = or ^ otherwise. 

We say that a relation TZ is an expansion iff, whenever PTZQ, P ^ P' 
implies that there exists a Q' s.t. Q ^ Q' and P'TZQ' , and Q Q' implies that 
there exists P' s.t. P P' and P'TZQ' . In this case, we write P ^ Q for PTZQ. 

< is a preorder on processes that is more realistic than (strong) bisimulation 
for reasoning about larger case-studies such as protocols. It allows one to prove 
that a given term “respects” a behaviour, modulo some extra r steps (given 
by transitions A): typically, the duality specification vs. implementation can be 
expressed using expansion. Our treatment of the up-to techniques is straight- 
forwardly adapted to handle expansion; the user of our tool can choose between 
a strong and a weak mode, corresponding to relations ~ and < respectively. 

We first informally introduce the protocol, then present an encoding adapted 
from [Mam98], and finally discuss the encodings we use in order to automatically 
perform the correctness proof of the protocol. Our purpose in the latter part is 
to show that the question of defining encodings in the framework of verification 
differs from the usual problem (i.e. the study of expressiveness), as exemplified 
on our case study. 

4.1 The Protocol 

Figure 3 presents the entities involved in the communication protocol: a first 
agent receives a message (whose content is not taken into account here) on 
channel acc. It then transmits this message on a channel called trans to a second 
agent. But channel trans is unreliable, and some messages transiting on it may 
get lost: to handle that, a boolean tag is associated to the message being sent 
on this channel, and the first agent keeps sending this information on trans. By 
reception, the second agent transmits the message on deli and is willing to send 
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Fig. 3. The Alternating Bit Protocol 



back the boolean to the first agent on ackn, as an acknowledgment. Channel 
ackn being lossy as well, the second agent has to do this repeatedly. When 
the acknowledgment eventually arrives, a new cycle can start with the negated 
boolean, so that the first agent can tell when the message has actually been 
received. 

The TT-calculus processes^ implementing this protocol are defined on Figure 
4 (again, our encoding is adapted from [Mam98]): the first agent’s behaviour 
is given by processes Send and Wait, while the second agent corresponds to 
process Receive; message losses on trans and ackn are represented by process 
Noise, that sends a signal on a channel loss whenever a signal gets lost (one 
could see these synchronisations on loss as a modelisation of a timer mechanism 
to detect losses). Note that the agents are parametrised upon two booleans: this 
will be discussed below. Finally, process Specif is an ideal specification of the 
protocol’s behaviour: it receives a signal on acc, sends one on deli, and starts 
again by sending a signal on the trigger channel c. 



Send = !send(&i, 62).acc.trans[fo2; 6i].wait[&2; &i] 

Wait '^= !wait(6i, &2).(ackn(&i, &2)-send[&i; &2I + loss.trans[&i; &2].wait[6i; 62]) 

Receive !rec(6i, 62).trans(6(, &2)-*/ bi = b'l 

then acknfbi : &2l | rec[&i;&2] else deli.facknfb^: bil | Tec[b2;bi]) 

Noise '^= !noise.( trans(6, &'). loss. noise + ackn(6, 6'). loss. noise ) 

def 

System = (r^ trans, ackn, send, wait, rec, loss, noise, &i, 62) 

{Send I Wait | Receive | Noise \ !6i true | I&2 false | send[&i;&2] | r^[6i;&2] | noise) 

Specif {nc) (Ic.acc.deli.c | c) 

Fig. 4 . Modelling the Alternating Bit Protocol 



® The language we use here is actually value-passing CCS; for the encodings discussed 
below, however, we shall need the full power of rr-calculus. 
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4.2 The Encodings Discussion 

Choice — To model the alternating bit protocol in our system, we have to deal 
with two constructs that are not present in our language, namely operators 
on booleans and the choice operator. Well-known encodings of these constructs 
exist (see [Mil91,Nes97]); these encodings belong to theoretical studies, and have 
been mechanised e.g. in the programming language PICT [PT97]. We shall see 
here the problems that arise as we try to use them in the context of verification, 
and how they can be treated. 

Various encodings of guarded choice are investigated in [NP96,Nes97]; the 
general idea is to represent each branch of the choice by a parallel component, 
and to implement a lock mechanism to prevent other branches to fire when one 
branch has committed. In general, once a choice has been taken, the “dead” 
branches remain, and, depending on the encoding we choose, are “more or less 
alive” ; when it comes to automatic verification, we need to be able to erase these 
branches syntactically, to prevent our relation from growing. We have chosen to 
encode the binary choice of Figure 4 as follows (|] represents the encoding 
function) : 

la{x).P + b{y).Qj '^= {vl)(l \ a{x).l.b{x).P \ b{y).l.a{y).Q) . 

This encoding represents an adaptation of those described in [NP96]; here, a 
lock channel I is created and when one branch is chosen, it deactivates the other 
branch by consuming its head prefix, thus leading to a subterm of the form 
(i^l) (l.T), that can immediately be garbage collected using the additional law 
for structural congruence (see Section 1) . Our simplified encoding is correct due 
to an important property of the terms we consider: each time a choice construct 
is encountered in a run of the protocol (agents Wait and Noise), at most one 
process can interact with the branches involved in the choice (that is, we cannot 
have simultaneous emissions on ackn and loss, nor on ackn and trans). Thus, it 
cannot be the case that both branches commit before the lock I is consumed, and 
the untaken branch can be safely deactivated before the chosen branch proceeds. 

Furthermore, note as well that the simplest encoding of guarded choice, de- 
fined as follows 

la{x).P + 6(y).g] (W)(Z I l.a{x).P \ l.b{y).Q), 

and called internal choice in [NP96], cannot be adopted here, as the choice of 
the comitting branch depends on the context (since at most one branch can 
commit), and cannot be done internally before synchronisation. 

The original encoding of [NP96] could not be chosen, since it allows the 
computation to proceed before the dead branches have been deactivated. 

Booleans — We have already seen how booleans are encoded in the 7r~calculus. 
The encoding presented in [Mam98] uses a single boolean as parameter for the 
various agents, and the negation operator for the recursive calls in Send and 
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Receive. We could have encoded the latter operator as 

1^6] f) /'] | t' J \ f I ) ; (*) 

this encoding is not convenient for the purpose of verification, because each time 
the protocol loops, it keeps adding new terms, induced by the negation operator 
to the system, making the relation grow ad infinitum (intuitively, ^^6 does not 
reduce to b). We therefore use a trick, that consists in parametrising all the 
agents upon two booleans, that are exchanged as we want to make a recursive 
call with the negated bit (see Figure 4). 

We also have to manage the testing construct in agent Receive: here again, 
we take advantage of the additional law for structural congruence to get rid of 
inactive terms once the test has taken place. We represent the boolean operator 
“=” (that can be seen as the negation of the XOR operator) by: 

[6 = 6'1 bo{T,F).{i.t,f,t',f){b[t;f] \ F[t';f] \ t.{t'.T\f.F) \ f.{t'.F\f.T)). 

Once the test has been performed, non-chosen branches are automatically garba- 
ge collected in the normalisation process. 

Having defined the encoding of the protocol into our simple language, we 
provide our tool with the corresponding definitions, and the correctness proof of 
the property Specif < System is performed automatically. 



Conclusion 

We have seen how the up-to techniques for bisimulation can be adapted for 
the purpose of verification. The examples of persistent lists and of the proof 
of the alternating bit protocol suggest that the task of dealing with encodings 
in the 7r-calculus can be managed using these techniques. It is questionable, 
though, whether one should adopt a standpoint analogous to the design choices 
of PICT [PT97] , where a full-size programming language is built by adding 
successive layers to a simple core language through encodings of higher-level 
constructs. Indeed, developping an implementation in a system built this way 
adds transitions to the terms and calls for clever use of garbage collection (as seen 
on the examples); such issues are critical in the field of verification. Nevertheless, 
the experiments shown above are interesting both from a theoretical and from 
a practical point of view. 

From a theoretical point of view, we have seen how the application of theori- 
cians’ techniques (as the up-to methods originally are) for the purpose of verifi- 
cation gives extra insight on results such as the encodings of language constructs 
or of data. A striking example is the straightforward definition of the negation 
operator for booleans (see equation marked with (*)), which is catastrophic in 
terms of state space growth, as stated above. 

From a practical point of view, this work seems encouraging for the use of the 
up-to techniques in the field of verification. Of course, constructs like booleans 
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or the choice operator should probably belong to a “real size” system. However, 
the task of specification in a verification tool usually involves some formalisations 
that, on a larger scale, are quite akin to the encodings we have seen, at least 
from a methodological point of view. In this approach, the up-to techniques have 
been shown to be helpful in terms both of efficiency and of expressiveness. 

Regarding future work, it could be interesting to study whether the up-to 
techniques can be integrated in a tool like the Mobility WorkBench [VM94] , that 
is a much richer system than our prototype, and that moreover supports vari- 
ous supplementary facilities. On its own, however, our tool can be enriched to 
perform some more experiments concerning the verification of 7r-calculus bisim- 
ilarity results; larger case studies, for example, could be considered, or other 
encodings. Work is also in progress to adapt the up-to bisimulation verification 
method to open terms, i.e. terms possibly containing process variables, in order 
to be able to prove not only bisimilarity results, but also general bisimilarity 
laws. Finally, the issue of the usefulness of the up-to techniques within other 
models of concurrency should also be investigated. 
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Abstract. We introduce an improved version of the symbolic transition 
graph with assignment (STGA) of Lin. The distinction of our model is 
that the assignment of a transition is performed after rather than before 
the action. Consequently, it has two advantages over the original one: on 
one hand, most regular value-passing processes can be represented more 
intuitively and compactly as such graphs; on the other hand, the natural 
definitions of symbolic double transitions can be given. The rules which 
generate the improved STGAs from regular value-passing processes are 
presented. The various versions (late/early, ground/symbolic) of strong 
operational semantics and strong bisimulation are given to such graphs, 
respectively. Our strong bisimulation algorithms are based on the late 
strong bisimulation algorithm of Lin, however, ours are more concise and 
practical. Finally, the improved STGAs are generalized to both symbolic 
observation graphs with assignments and symbolic congruence graphs 
with assignments, and therefore weak bisimulation equivalence and ob- 
servation congruence can be checked, respectively. 



1 Introduction 

Process description languages are useful for specifying, designing and verifying 
concurrent distributed systems [1,12]. Bisimulation provides an excellent seman- 
tic theory for them[12]. Hence bisimulation checking, especially weak bisimula- 
tion and observation congruence checking, is the central issue and critical step 
for the application of such process description languages in practice. As we know, 
the transition graph is a standard semantic model for finite-state processes, any 
regular process expression of pure-CCS can be represented as a finite transition 
graph. Based on finite transition graphs and their variants-observation graphs 
and congruence graphs, efficient algorithms for checking strong/weak bisimula- 
tion equivalences and observation congruence have been proposed and used to 
build verification tools [2,3]. 

As for value-passing processes with infinite data domain, infinite transition 
graphs must be generated and compared in order to check bisimulation between 
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two processes, hence the existing tools are no longer applicable except for the so 
called ’’data independent processes” [6]. To overcome the disadvantage, symbolic 
transition graphs (STGs for short) were advocated in [4] as intuitive and finite 
representations for a large number of value-passing processes, by using symbolic 
actions with boolean guards. 

But even very simple processes such as P{x) <1= c\x^.P{2x + 1) can not be 
described by finite STGs. By introducing assignments into labels, the notion of 
symbolic transition graph with assignment (STGA for short) and an algorithm 
for late strong bisimulation were proposed in [10]. An edge there takes the form 

b,x:=e,(x 

m I > n. The intuition is that if the boolean condition b is satisfied at node 

m then the assignment x := e is performed and the result is used to evaluate the 
action a (if it is an output action) and to update the data values of x at node 
n. Thus the process term P{x) can be depicted as the left STGA in Figure 1. 
m{x} 

\true, X := x, c\x^ 



true, c\x^,x := 2x -I- 1 

true, X := 2x + 1, c\x^ 

Figure 1: Two versions of STGAs for P{x) 

Such STGA is indeed a finite representation of P{x), although it looks some- 
what strange at first sight. In order to check weak bisimulation and observation 
congruence along the lines of [8,9], we need to give a reasonable and natural 
definition of symbolic double transitions between the nodes of STGAs. 

bAb' 9,9' B ,c\e' b,9,c\e b' ,9' ,r 

Intuitively, we should infer m \ ■■ > ^ n from m \ -- > j^l— — > n, where 

e' is a data expression satisfying the equation e'9'6 = eO. For example, let 0, O' 
and e be a; := 2x+l, x := x^ + l and a: -1-1 respectively, then e'[{2x+lY +l/x\ = 
{2x -I- 1) -I- 1, so e' = ^x — 1 -I- 1. In general, it is difficult, if not impossible, to 
figure out e' . Hence we can not give a reasonable definition of symbolic double 
transitions for the original STGAs. 

However, if the assignment of a transition is performed after rather than 
before the action, we can obtain an improved version of STGA. In our model, 

b,oc,x\—e 

an edge now takes the form m \ — ^ — > n. It means that if b is satisfied at node 
m then the action a is performed and the assignment x \=e is used to update 
the data values of x at node n. Here the assignment no longer applies to a, so 
P{x) can be expressed as the improved STGA in the right side of Figure 1. 

In fact it is the key to swap the order of the assignment and the action. 
Based on our model, we can define symbolic double transitions over nodes, which 
are shown in Definition 4.1 and 6.1 respectively. It is rather surprising that 
most regular value-passing processes can be represented more intuitively and 
compactly as the improved STGAs. For example given the following definitions 
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In section 3 two versions of late operational semantics and late strong bisimula- 
tions are given to such graphs. In section 4 two versions of late weak bisimulation 
and late observation congruence are defined, and moreover, equivalent charac- 
terizations of their symbolic versions are established. The notions of SOGA and 
SCGA are also given in this section. Algorithms for checking late strong/weak 
bisimulation equivalences and observation congruence are presented in section 5. 
Section 6 outlines the changes necessary to handle the early case. Finally some 
conclusions and future research directions are included in section 7. 



2 An Improved STGA Model 

For brevity, we assume that all data values allowed to pass through channels 
are of the same type. Val is the set of such data values, ranged over by v. We 
also presuppose the following syntactic categories: V ar = {xq,xi,X 2 , ■ ■ ■} is a 
countably infinite set of data variables, ranged over by x,y,z. PV ar is a set of 
predicate variables, ranged over by X. DExp is a set of data expressions, ranged 
over by e. BExp is a set of boolean expressions, ranged over by b. Chan is a set of 
channel names, ranged over by c. Moreover, we assume that VarUVal C DExp, 
and e = e' £ BExp for any e,e' G DExp. BExp is closed under the usual 
operators A, V, — > and V. If G C Gar is a finite set, we use new{V) to denote 
the least variable Xi GVar such that Xi is not in V. 

An evaluation p G Eval is a mapping from V ar to Val and we use the 
notation p{x i-^- u} to denote the evaluation which differs from p only in that it 
maps X to V. Obviously p(e) G V al and p{b) G {true, false}. We use p [= b to 
indicate p{b) = true, and 6 |= 6' to mean p \= b implies p \= b' for any p. We will 
also write 6 = 6' for 6 |= b' and 6' ^ 6. A finite set of boolean expressions B is 
called a b-partition, if VB = 6. 

A substitution a G Sub is a mapping from V ar to DExp. We write [e/a;] for 
the substitution sending x to e, with x and e having the same length. If <t = \e/x] 
then dom{a) = {x}, cod{a) = fv{e) and n{a) = dom{a) U cod{a). 0 denotes the 
empty substitution, a\V the restriction of ct on G and ea the result of applying cr 
to e. If dom{ui) n dom{(T 2 ) = 0, then a\ U (T 2 is also a substitution. Gomposition 
of substitutions cr and a' is denoted by a a' such that (e)cr(j' = (ecr)cr', and 
cr[a; 1 -^- e] denotes the substitution which differs from a only in that it sends x to 
e. If cr = \e/x] then the application of cr to p is defined by ap = p{x p(e)}. It 
is easy to see that (crp)(e) = p(ecr) and crp |= 5 iff p ^ ba. 

An assignment 0 G Assign has the form x := e, which can be identified with 
the substitution \e/x\. Thus the above notations for substitution can be also 
applied to assignment. 9\x = 9\{dom{9) — {a;}). 

An action a G Act is either a silent action t, an input action clx, or an 
output action c!e. We use chan{a) to denote the set of channel names occurring 
in a, fv{a) and bv{a) the sets of free and bound variables of a. aa is defined 
by (c!e)cr = d(ecr), and aa = a otherwise. We write a a! to mean that if 
a = cle then a' = c\e' and 6 ^ e = e', otherwise a = a' . We also assume that 
T = e and d = cr if a ^ r. A guarded action with assignment is a triple {b, a, 9). 
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Definition 2.1 A symbolic transition graph with assignment (STGA for short) 
is a rooted directed graph where each node n is associated with a finite set of free 
variables fv{n) and each edge is labelled by a guarded action with assignment 

b,oc,x:=e 

such that if m \ — ^ — > n is a labelled edge from m to n, then fv(b, a, e) C fv(m), 
fv{n) C fv{m) U bv{a) U {x} and bv{a) fl ({x} U fv{e)) = 0. 

In the above definition, neither {ir} C fv{m) nor fv{a) C {x} is required. 
Thus identity assignment is unnecessary in the improved STGA, while it is 
indispensable in the original STGA. The trivial boolean guard true or the empty 
assignment 0 will be omitted from the edges. The constraint bv{a) fl ({x} U 
fv(e)) = 0 is imposed just to simplify the parallel composition rules and the 
operational semantics of the improved STGAs. 

Like [10], the parallel composition {Q || \ i? of two improved STGAs, Q 

and H, with disjoint data variable spaces, are generated by the following simple 
rules: 

b,cx ,'x: = e 

Par chan{a) nR=% 

<Cm,n> I > <.m' ,n> 



bi ,c? z ,x: = ei b2 ,c\e,y: = e2 

Gom ^ 

0iAo2,T,x,j/,2: = ei,e2,e 

<m^n> 

For any term t in regular value-passing GGS, which is given by the following 
BNF grammar: 

t ::= Q \ a.t \ b ^ t \ t + t \ P{e) 

we can generate an improved STGA by means of the following rules: 



true,oc,x:—e 

a.P{e) I ■ > P{x) 

true,oc,0 

a.u I — u 

true^clz,^ 

cly.u I > u[z/y] 

h.OL.O 



a G {r, c!e'} 

a G {r, c!e'},uis not of the form P(e) 
where z = new{fv{cly.u)) 

bAb' ,oc,9 

implies b' ^ u I — u' 

b,a.,0 

implies u + w \ ■ > u' 

b,a,B 

implies w + u \ ■ > u' 



implies P{e) 






a G {r, cle'} 



b,c?y,6 b\e /"x] ,c? z ,6 (\x:—'^\y) 

t\ > t' implies P(e) h ^ — > t'[z/y] 

where z = new{n{9{[x := e]\y)) U fv{cly.t')) 
where P{x) f is a definition. 

Here we use 9 to denote the result obtained by eliminating the identity as- 
signments from 9. Like [10], we also adopt a lazy approach (the last two rules 
above) to infer moves from a recursively defined process term P(e). However, 
there are two main distinctions between our generation rules and those of [10]: 

(1) A case analysis of the action prefix a.u is performed. The first rule is the 
key, which transforms the applied occurrence P(e) of P in cx.P(e) into the stan- 
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dard template P{x) directly, by introducing the assignment x := e immediately. 
In this way the graphic representation is true of the syntactic representation of 
the process and thus there is no semantic gap between the two kinds of repre- 
sentations. Besides, the standard template P{x) is easier to be reused and hence 
the generated STGA may be more compact. The examples in the Introduction 
are clear proofs, from which more extreme examples can be derived. Of course, 
there exist also counter-examples. For instance consider the process P{e^), where 
P{x) <1= c!ei.P(e2) and 63 = e^- Pie^) can be represented as an original STGA 

true,x:—e2-,c\ei 

with one node m{x} and one edge m I > m. The improved STGA for 

Pies) contains two nodes and two edges. 

(2) A mechanism of renaming bound names by the function newQ is intro- 
duced when dealing with input action dly. The significance of a-conversion can 
be seen from the following example. Gonsider the process M defined by 

M < 1 = clx.N{x) 

N{x) <1= a\x.N{x -I- 1) -I- bly.N{y) + clz.N{z) + cP.v.Niv) 

It is unimaginable what will happen without a-conversion when generating the 
STGA of M. It is the very cause that a-conversion with new() is also proposed 
in [4]. To reduce the side-effect of a-conversion, we introduce [ir := e]\y in the 
last rule to avoid unnecessary renaming of the bound name y. 

Based on the above analysis and comparison, we think, in most cases, the 
improved STGA representation of a process may be more intuitive and concise 
than the original one. Hence when applying any bisimulation checking algorithm 
to the improved STGAs, an obvious benefit is that its execution time is shorter, 
the returned predicate equation system is simpler and easier to be solved. 



3 Late Strong Operational Semantics and Bisimnlation 

Since each node of a STGA Q may be associated with a set of free variables, 
we can provide two natural interpretations for it as [10]. A state Up consists of 
a node n and an evaluation p, where p is restricted to /u(n). One can use p to 
evaluate each outgoing edge from n, resulting in a ground transition. We use 
p, q to range over states. If p is a state Up, we use p{x ^ u} to denote the state 

^p{xi—^v} • 

Definition 3.1 The late ground operational semantics is defined as the least 
relation over states satisfying the following rules: 

b,T,6 b,cie,6 b,c?x,6 

phb Phb Pi=b 

Definition 3.2 A late strong ground bisimulation is a symmetric relation R 
over states such that if {mp,ns) € R then 

1. mp m'p, implies there exists ns n'g, and , n'g , € R 

for all V e Val. 
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2 . for any other action a € {r, c!u}, rrip implies there exists ns > 

n'g, and G R. 

We write Wp ns if there is a late strong ground bisimulation R such that 
{nip, ns) G R. Two STGAs Q and Q' are late strong ground bisimilar with 
respect to p if rp r'^, where r and r' are the root nodes of the two graphs. 

We can also give an abstract operational semantics to A term n^, also 
written as (n, cr), consists of a node n and a substitution cr. na means n 

We use t,u to range over terms. If t is a term Ua-, we use t[x z] to denote the 

term fv{t) the set fv{fv{n)a) of free variables and tp the state nap- 

Definition 3.3 The late symbolic operational semantics is defined as the least 
relation over terms satisfying the following rules: 

b,T,0 b,c\e,0 b,c?x,0 

z^fv{ma)Ufv{{fv{n)-{x})0a) 

rrifj ^ ne^ rria 

Definition 3.4 Let S' = {S '^|6 G BExp} be a boolean expression indexed family 
of symmetric relations over terms. S is a late strong symbolic bisimulation if 
{t, u) G implies 

Whenever t t' with bv{a)rifv{t, u, b) = 0, then there is a 6A5i-partition 

B with fv{B) C fv{t,u,b) such that for each b' G B there exists a u u' 
such that b' \= b 2 ,a =^' a' and (t',u') G S^'. 

We write t u if there is a late strong symbolic bisimulation S such that 
(t,u) G S^. Two STGAs S and Q' are late strong symbolic bisimilar over b if 
T 0 T 0 , where r and r' are the root nodes of the two graphs. 

The following theorem underlies the significance of symbolic bisimulation. Its 
proof follows the lines of Theorem 4.5 in [4]. 

Theorem 3.5 ma na' if and only if map na'p for every p such that p\=b. 

4 Late Weak Bisimulation and Observation Congruence 

To define the late ground/symbolic double arrow relations, we first define the 
late symbolic double transitions over nodes. 

Definition 4.1 The late symbolic double transitions are defined as the least 
relations over nodes satisfying the following rules: 

trite , £,0 

• m I — ' > ^ m. 

b,a,9 b,a,0 

• m I ' > n implies m \ — > j^ n. 

b,r,9 b' ,a.,9' b/\b' 9 ,ol9 ,9' 9 

• m I ■ ' > 1 = 4 .^ ri implies m \ — > ^ n. 

b,OL,9 b' , t, 9' b/\b' 9 ,a.,9' 9 

• If a is not of the form clx then m \ — > ^| — — n implies m | -- > n. 
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Definition 4.2 The late ground double arrows are defined as the least relations 
over states satisfying the following rules: 

Phb 

^ l'f^6 p 
b,c?x,9 

Phi Phi 

^ l'f^6 p 

Definition 4.3 A late weak ground bisimulation is a symmetric relation R 
over states such that if {nip, ns) € R implies 



6 ,£, 6 > 

m\=^ 

e 

771 p 1 ' 177Q p 

b,c\e,9 
c\p(e) 

77lp ~~ 1 * l770p 



1 . Whenever mp — then there exists =!>/ n^, and for all v € Val, 

there exists such that =^i n^„ and {'<n'p^^^^-^,n^,,) € R. 

2 . Whenever Wp m'p, for any other action a G {t, c!u} then ns =^i n'g, 
for some n'g, and (m^, ,n'g,) G R. 



We write nip ns if there is a late weak ground bisimulation R such that 
{nip, ns) G R. Two STGAs Q and Q' are late weak ground bisimilar with respect 
to p if Tp r'p, where r and r' are the root nodes of Q and Q' , respectively. 

The late ground observation congruence is then defined in terms of as 
usual. To define the symbolic version of late weak bisimulation and late obser- 
vation congruence, we first introduce the late symbolic double arrows. 



Definition 4.4 The late symbolic double arrows are defined as the least relations 
over terms satisfying the following rules: 



m|= 



b,c\e,l 

m I — > 



b(7,s 

r L/7l0(j 

b,c?x,9 



m(j=^L779o- 771^ =^> Lnoa 

where z ^ fv{m„) U fv{{fv{n) — {x\)9a) 






Lemma 4.5 If t t' , then one of the following cases holds: 



— b = true, a = e and t' = t. 

, . b,r , 

— It a ^ e then t — > r ; otherwise t — > r. 

— There exist t" , 5 i, 62 such that t t" t' and b = b\ f\b2- 

— If a is not of the form clx then there exist t" , b\, 62 such that t l t" 
t' and 6 = 61 A 62 . 



Definition 4.6 Let S' = {S'** | 6 G BExp} be a boolean expression indexed 
family of symmetric relations over terms. S is a late weak symbolic bisimulation 
if {t, u) G implies 

Whenever t t' with bv{a)nfv{t, u, b) = 0 , then there is a 6 A 61-partition 

B with fv{B) C fv{t,u,b) such that for each b' G B there exists a u =^l u' 
such that 6' ^ 62 a =** a' and 
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~ ii a = c!x then there is a ^'-partition B' such that for each b € B' there 

exists a v! ^’% l u" such that b" ^ 63 and {f ,u') G . 

— otherwise {t',u') G S^' . 

We write t u if there is a late weak symbolic bisimulation S such that 
(t,u) G S^. Two STGAs Q and Q' are late weak symbolic bisimilar over b if 
C0 Tg, where r and r' the root nodes of the two graphs. 

The late symbolic observation congruence is defined in terms of as 
usual. The two versions of late weak bisimulation equivalence and late observa- 
tion congruence can be related as in the case of late strong bisimulation. 

Theorem 4.7 (Soundness and Completeness of and 

1. nia ria' if and only if map ~i ria'p for every p \= b. 

2. ma Ua' if and only if map —i ria'p for every p\= b. 

We can give an equivalent characterization of late weak symbolic bisimulation 
only in terms of late symbolic double arrows. It underlies our algorithm for 
checking late weak bisimulation. 

Theorem 4.8 Let S = {S'** | b G BExp] be a boolean expression indexed family 
of symmetric relations over terms. S' is a late weak symbolic bisimulation, if and 
only if for any (t, u) G and a G {e, c!e, clx} 

Whenever t =^l t' with bv{a) n fv{t,u,b) = 0, then there is a 6 A 61- 
partition B with fv{B) C fv{t,u,b) such that for each b' G B there exists a 

u =^L u' such that 6' ^ 62, a a' and 

— if a = c?a: then there is a 6'-partition B' such that for each 6 G B' there 

exists a u' ^'% l u" such that b" ^ 63 and G S** . 

— otherwise G . 

To establish the equivalent definition of late symbolic observation congru- 

b,a ,9 ^ ^ 

ence, we must introduce the positive | — (resp.==4>+L) which differs from 

b,a ,9 ^ ^ 

I — (resp. "A > l) only in that it excludes the reflexive case. 

Theorem 4.9 Two term t, u are late symbolic observation congruent with re- 
spect to 6, i.e. t u, if and only if for any a G {e, c!e, c?x} 

whenever t' with bv{a)nfv{t, u, b) = 0, then there is a 6A6i-partition 

with fv{B) C fv{t,u,b) such that for each b' G B there exists a u u' 

such that b' ^62, a a' and 

— if a = c?a: then there is a 6'-partition B' such that for each 6 G B' there 
exists a u' ^’% l u" such that b" \= 63 and t' u" . 

— otherwise u'. 

And similarly for u. 
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Based on Definition 4.1, Theorem 4.8 and Theorem 4.9, we can generalize 
the notions of observation graph and congruence graph of [2] to the symbolic 
level to check weak bisimulation and observation congruence for finite STGAs. 
We use ND{Q) and E{Q) to denote the sets of nodes and edges of Q respectively. 

Given a STGA Q, its late SOGA is defind by ND{Q^) = ND{Q) and 

b,oc,9 true,e,(j> 

E{Q^) = {m I — > j^ n I m, n G ND{Q) and a G {e, c!e, clx}}. Since m \ — > ^ 
TO, there will be one such self £-looping edge for each node in . 

The late SGGA is a variant of that records the possibility of initial 
T-actions. Let r be the root node of Q, then can be constructed as follows: 

i) if r has no incoming edges, is obtained simply by excising the self 
e-looping edge of r from . 

ii) otherwise we first construct a new STGA Q' with a new root node r' such 
that fv{r') = fv{r), ND{G') = ND{G) U {/} and E{G') = E{G) U {r' n | 

b,a,9 

r I > n G E{G)}- Obviously r0 and r' has no incoming edges. Then 

G'^ is obtained by excising the self e-looping edge of r' from G'^ ■ 

The symbolic approach used in this paper can avoid completely the state 
explosion due to the instantiation of input variables. However, G^ and G^ may 
be infinite graphs if G contains T-cycles with assignments. Since our algorithm 
for weak bisimulation and observation congruence is based on finite SOGAs and 
SGGAs, let us consider which kinds of G can produce finite G'^ and G^' 

(1) If there is no r-cycle in G, then G^ and G^ are finite. 

(2) If each edge of every r-cycle of G contains no assignments, we can also 
generate finite G^ and G^ ■ Moreover, under certain conditions, we can even 
eliminate such r-cycles in advance. 

(3) As shown in Figure 3, the r-cycle may contain edges with assignments but 
its execution will terminate because of the changes of values of data variables. In 
this way, the manipulation of data expressions will be carried out by means of 
assignments and controlled r-cycles. If the upper bound of the execution times 
of such a r-cycle can be calculated and this upper bound is independent of the 
assigned variables, then we can generate finite G'^ and G^ as well. For instance 
the upper bound is x or for the left STGA in Figure 3, and moreover, it 
is independent of the assigned variables yi, j/2 and 2/3. 

Fortunately, by investigating a large number of examples, we find out that 
the above three kinds of STGAs are sufficient to describe many practical con- 
current communicating systems such as network protocols. How to check weak 
bisimulation and observation congruence directly on top of STGAs rather than 
on SOGAs and SGGAs deserves further research. 

5 The Algorithms for Bisimulations 

A late strong bisimulation algorithm for STGAs was presented in [10]. Based on 
the improved STGA model, we propose a more concise and practical algorithm 
for late strong bisimulation, and further extend the framework to deal with late 
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weak bisimulation and late observation congruence, we refer the readers to [10] 
for a detailed account on such notions as predicate equation system. 

The algorithm for late strong bisimulation is in Figure 4. Its input is a pair 
of finite STGAs, Q and H, with disjoint name spaces for data variables. The 
output is a predicate equation system. Starting from the pair of root nodes 
(r, r') of Q and H, the function close introduces a predicate variable Xm,n 
and the corresponding equation for each pair of matching nodes {m,n). Thus 
the formulation and correction proof of the algorithm become clearer and sim- 
pler than the algorithm of [10], while the additional predicate variables can be 
eliminated just by substitutions. It must be pointed out that each Xm,n al- 
ways has formal parameters fv(m) U fv(n), which are listed in order as in Q 
and H. Since fv{m) n fv{n) = 0, we will write Xm,n{o' U cr') for Xm,n(o', cr'). 
Hence each applied occurrence of Xm,n will be written as Xm,n(o'), which means 
Xm,n{fv{m)a, fv{n)a). 

bisim(g,H) = { PES--0 } 
close{r, r', 0, 0) 
return(Pi5S) 

close{m, n, a, W) = 
if (m, n) ^ W 

then { PES := PES U {Xm,n = rnatch(m,n,W)} } 

if 1^/0 

then return(Xm,n(o-)) 

match{m, n, W) — /\{matchj{m, n, W) \ 7 G N AType{m, n)} 

7 

matchr{m, n, W) = 

let Bij = close{mi, rij, U 6' , IT U {(m, n)}) 

bi,T,ei b'.,T,e'. 

for m I > rrii, n \ > Uj 

in Mh ^ A Bii)) A ^ A B,,)) 
i j j i 

matchc\{m,n,W) = 

let Bij = closeljm, nj, 9iU 9'j,W U {(m, n)}) 

for m I rrii, n \ rij 

in /\{hi X/ip'j A = e'j A Bij)) A /\{h'j ^\J(hi ACi = e' A Bij)) 
i j j i 

matchcT{m, n, IT) = 
let 2 ; = newVari) 

Bij = close{mi, rij, 9i[x 2 ] U [y 2 ] , IT U {(m, n)}) 

bi,c?x,ei b'.,c7y,e'. 

for m I > rtii, n \ > rij 

in /\{hi->\J{h'j A 'izBij )) A /\{b'j ^ \J{bi A \/z Bij)) 
i j j i 

Figure 4: The Algorithm for Late Strong Bisimulation 

Compared to the algorithm of [10] , another important advantage of our algorithm 
is that we avoid finding out all matching loop entries in advance by extending 
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the functions close and match with parameter W to record the pairs of matching 
nodes searched. Thus our algorithm is more practical when applied to complex 
STGAs. NAType{m,n) in Figure 4 is the set of types of actions that appear in 
the next transitions from m and n. The types of actions r, c!e and clx are r, c! 
and c?, respectively. 

The correctness of the algorithm is guaranteed by the following theorem. 

Theorem 5.1 Let PES = {Xm,n = Am,n} be the predicate equation system 
returned by the algorithm on Q and 7t. 

(1) If ?7 is a symbolic solution of PES and = b then 

(2) If rj is the greatest symbolic solution of PES and m„ then b ^ 

r]{^m,n){cF U a'). 

For instance the simplified equation system PES returned by the algorithm 
for the two STGAs in Figure 3 is: 

Ao,o = VziAi,i(2i, Zl) 

Xi^i = {x<0Au<0)\/{x>0Au>0A X 2 , 2 {x, 0, 1, 1, u, 0)) 

X 2,2 = {ys < X A {v + 1)^ < u A X 2 , 2 {x, yi + 1, j /2 + 2, ys + y 2 + 2,u,v + 1))V 
{ys > X A {v + 1)'^ > u Ayi = V A As, 3 ) 

As , 3 = true 

Where the formal parameter sets of Ao,o, Ai,i, A 2,2 and A 3, 3 are {}, {x,u}, 
{x, t/i, j/ 2 , 2 / 3 , u, u} and {}, respectively. By mathematical induction and the fact 
1 + 3 + 5 + . . . + ( 2 n — 1 ) = , we can conclude that Aq,o = true, i.e. the two 

STGAs in Figure 3 are late strong bisimilar. 

Given two STGAs Q and H, the algorithm for computing late weak symbolic 
bisimulation (resp. late symbolic observation congruence) is similar to the one 
presented in Figure 4 but working on and (resp. and As Defi- 
nition 4.4, Theorem 4.8 and Theorem 4.9 indicate, we need not only to replace 
matchr with matchg but also to modify matched to 

matchcT{m, n, W) = 
let 2 : = newV ar{) 

Bijk = close{mi, rijk,0i[x 2 ] U [j/ 2 ]), IF U {(m, n)}) 

bi,cix,ei 

for m I > ^ rm, n \ > ^ rij \ > ^ rijk 

Biij = close{mu,nj,6ii{6i[x 2 ] ) U 61' [j/ 2 ] , IF U {(m, n)}) 

bi,c?x,0i ba,e,9ii b'.,cly,e'. 

for m I > ^ rtii \ > ^ mu , n \ > ^ nj 

in /\{bi V(^jA\ V 2 (V(&)fc 6 ')[ 2 / 2 ] A Bi,jfc))))A 

Aibj \/ibi A V 2 (V bu0i[x 2 ] A Buj)))) 

j i I 

The correctness of the modified algorithm can be guaranteed by a similar 
theorem as Theorem 5.1. These algorithms reduce the problem of checking bisim- 
ulations to the problem of computing the greatest solution of predicate equation 
systems. Unlike boolean equation systems, which can be effectively solved [7,11], 
in general, the greatest solution of a predicate equation system can not be com- 
puted automatically. However, one can reason about the greatest solution using 
data domain knowledge, as in the above example. 
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6 The early case 



In this section we briefly outline how to deal with the early case by modifying 
the previous semantic theory and checking algorithms. It turns out that only 
those parts concerning input actions need changing. 

For the early ground operational semantics, we only need to change the rule 
concerning guarded input with assignment in Definition 3.1 to 

b,c?x,e 

z' P h ^ v&Val 

Hence the definition of early strong ground bisimulation can be obtained from 
Definition 3.2 by merging the first clause into the second one. 

The early symbolic operational semantics is the same as the late case, while 
the definition of early strong symbolic bisimulation is obtained from Definition 
3.4 by replacing the condition fv{B) C fv{t,u,b) with fv{B) C fv{t,u,b) U 
bv{a). It means that the early version allows to partition over the value space 
for an input variable, while in the late version this is prohibited. 

Because the early bisimulation allows a single input move from one process 
to be matched by several such moves from the other, so there is no danger for 
input moves to absorb t moves after them. Hence the early symbolic double 
transitions are defined as follows. 

Definition 6.1 The early symbolic double transitions are defined as the least 
relations over nodes satisfying the following rules: 

irite,£,0 

• m I — ' > ^ m. 

b,a,0 b,a,9 

• If m I ■ ' ' > n and a € {r, c!e},then m \ — n. 

b c?x 6 b,true,c?x,9 

• If m I > n, then m | — ' > ^ n. 

b,r,6 b' ,a.,0' bAb'9,a0,B'& 

• If m I ■ ' ' > |==^g n and a G {e, r, c!e} , then m | — ' > ^ n. 

b,r,9 ,b2,c?x,0' bAh\9,h29\x,clx^9'{9\x) 

• If m I ■ > I — > ^ n , then m \ — > ^ n. 

b,a,9 b',r,9' bAb' 9,a.,0'0 

• If m I — > ^| ■ ' > n and a G {e, r, cle} , then m \ — > ^ n. 

bi,b2,c?x,0 b' 9' ,b2Ab'9,c?x,9'9 

• If m I — > g| ■ ' > n , then m \ — > ^ n. 

x'>0.,T,x:—x-\-l true,c?x,0 x<.0,t,x:—x-\-4 

For the path: TOoja;} h - — > mi{x} \ — ^ — > rri 2 {x} \— — > mslx}, 

x>0,x<0,c?x,x:—x-\-4 

we must introduce an early symbolic double transition mo{a:} | — > ^ 

rnslx} to distinguish the free and bound occurrences of x. 

The early ground double arrow is defined similarly as Definition 4.2, except 
that it is in terms of early symbolic double transitions rather than late symbolic 
double transitions and the rule for input action must be modified to 
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,62 ,c?x ,0 



m \ — > 



p \= bi A b 2 [v/x], V G Val 



Hence the definition of early weak ground bisimulation can be obtained from 
Definition 4.3 by merging the first clause into the second one. 



For the same reason, to define the early symbolic double arrow, the rule for 
input action is 



,b 2 ,c?x,6 



bi<7 Ab2<^[x^ z 






where z ^ fv{ma) U fv{{fv{n) — {x\)9a) 



The early weak symbolic bisimulation differs from the late one only for input 
moves: now input moves can absorb t moves after them. As a consequence, the 
two partitions used in Definition 4.6 can be combined into one, hence the special 
clause for input moves can be merged into the other one. The same for early 
symbolic observation congruence. 

Fortunately, we can also establish equivalent characterisations of early weak 
symbolic bisimulation (resp. early symbolic observation congruence) only in 
terms of early symbolic double arrows — > e (resp. =>+£;). Thus the only 
difference between early strong and early weak symbolic bisimulation is that the 

5, (A 

strong version is defined in terms of symbolic single arrows >, while the weak 

version is defined in terms of early symbolic double arrows =^e- 

The algorithm for late strong bisimulation can also be modified to check early 
strong bisimulation. As may be expected, we only need to change matchc? to: 

matchc?{m, n, W) = 
let 2 = newVari) 

Bij = doseinii, rij, 9i[x 2] U [y 2] , IT U {(m, n)}) 

bi,clx,0i b'.,c7y,0'. 

for m I > rtii, n \ > nj 

in V 2 (A(&i ^ V(6) A Bp)) A A(6) ^ V(&i A Bp))) 
i j j i 

Given two STGAs Q and H , we first construct the corresponding early 
SOGAs (resp. early SGGAs) Q' and Ti.' . The checking algorithm for early weak 
symbolic bisimulation (resp. early symbolic observation congruence) is similar 
to the one presented in Figure 4 but working on Q' and Ti! . As indicated above, 
we need not only to replace matchr with matchs but also to modify matched to: 

matchc 7 {m, n, W) = 
let 2 = newV ar{) 

Bij = close{mi,nj,9i{[x ^ 2]) 'J 9 'j{[y ^ z\),W U {(m,n)}) 

b.-^^,b.^,c7y,e. 

for m I > g mi, n \ > ^ rij 

in V 2 (A(&ii A bi 2 [x 1-^2]^ V(^ji A 6A[y ^ z] A Bij)) A 
i j 

/\(b'ji A b'j 2 [y 1-^2]^ \f{bn A b^lx 2] A Bp))) 
j i 
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7. Conclusion 

Based on the improved STGA model, We have presented algorithms to check 
strong/weak bisimulation equivalences and observation congruence for value- 
passing processes. These algorithms reduce the problems of checking bisimula- 
tions for STGAs to the problems of reasoning about the greatest solutions of 
predicate equation systems over data domain. We would like to investigate tech- 
niques for verifying properties concerning such solutions in the future. Gurrently 
we are implementing all these algorithms in standard ML, and hope to establish 
an automatic verification tool of practical usefulness for value-passing processes 
and the 7r-calculus. 
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Abstract. We present an experiment which has demonstrated that meth- 
ods and tools developed in the context of black box conformance testing 
of communication protocols can be efficiently used for testing the cache 
coherency protocol of a hardware multi-processor architecture. We have 
used the automatic conformance tests generator TGV developed by IN- 
RIA to generate abstract tests and we have developed a software in order 
to make them executable in the real test environment of Bull. 

The TGV approach has been considered by the hardware testing com- 
munity as a serious alternative to usual random test generation. It over- 
whelms the well known debugging and coverage problems linked to this 
kind of technic. 



1 Introduction 

In this paper, we are concerned with the so called conformance testing which 
consists in testing whether an implementation of a system behaves as described in 
its specification. According to the domain, there exists different kinds of methods 
and tools dedicated to conformance testing. In some cases, one can find some 
similitude between these different methods. This is the case for hardware off-line 
testing and communication protocol conformance testing in the experiment we 
are describing in this paper. 

On one side of this end-to-end experiment done in the context of the VASY 
(Validation of Systems) action within the Dyade / Bull-Inria R&D Joint Venture, 
there were engineers of Bull using their usual methodology to develop a multi- 
processor architecture called in the following the Bull’s CC_NUMA machine. 
In hardware design the description of the system is often based on hardware 
description languages such as VHDL [1] or Verilog [2]. This is due to the 
ability of these languages to describe various levels including hardware-related 
details such as register-transfer, gate and switch levels. In the case that one is 
particularly interested with high-level functionalities, such as Cache Coherency 
Protocols, these details may lead to over-specification. Even though, there are 
different abstraction levels (VHDL behavioral style), abstract synchronization 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 315-329, 1999. 
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mechanisms such as rendez-vous are not easily available. Moreover, the tools ap- 
plied on the specifications (for verification, test generation,. . . ) may be unusable 
since they are needlessly complex. Therefore, one may wonder whether the for- 
mal specification languages and associated tools designed in another analogous 
domain (like computer network area) could be better suitable for the descrip- 
tion and test of high-level functionalities [3] . We have chosen the LOTOS language 
for the formal specification of the Bull’s CC_NUMA architecture because its 
underlying semantics model is based on the rendez-vous synchronization mech- 
anism which is well suited for the specification of hardware entities [4] such as 
processors, memory controllers, bus arbiters, etc. The communications between 
these components by sending electrical signals on conductors are easily described 
by interactions between LOTOS processes. Another reason of this choice is that 
LOTOS is a standard language well known in computer network community and 
often used for the description of communication protocols. 

On another side, the prototype TGV has been developed by Inria-Rennes to 
generate test cases for communication protocols using the black box conformance 
testing approach. The main purpose of TGV is to fit as well as possible the 
industrial practice of test generation. Given a formal specification of the system 
to be tested and a formal description of a test purpose (which represents an 
abstract form of the property to be tested), TGV generates an abstract test 
case. It is a direct acyclic graph in which each path represents a test sequence 
with associated verdict which indicates whether the implementation under test 
(lUT) conforms with the specification or not [5]. A test case generated by TGV is 
interactive because each sequence is series of interactions between the tester and 
the lUT. and an output of the tester depends on what it has previously observed 
from the lUT. Notice that this is not the case in hardware testing which is rather 
“batch”: after stimulating the HIT, one observes its reactions and analyzes them 
afterwards. TGV has been experimented on the Drex military protocol [6] and 
on the SSCOP protocol [7]. The comparison of the hand written test cases with 
those automatically generated by TGV, has shown its interest and efficiency. 

The deal in the experiment described in this paper consists in demonstrating 
that the TGV tool which has been developed for conformance testing of com- 
munication protocols can also be efficiently used to generate tests for hardware 
architectures. In a first step of this experiment, we have proved that the testing 
activity done by hand can be automatically done using TGV approach [9]. The 
main contribution of the results presented in this paper lies in the fact that in this 
second and final step of our experiment, we have also demonstrated that: “the 
interactive nature of conformance testing with TGV is advantageous for hardware 
testing, because it improves the quality of the tests and the test coverage”. 

The following section describes the Bull’s CC_NUMA machine, its architec- 
ture, its cache coherency protocol, the test purposes and the hardware testing 
methodology habitually used. In the third section, we present the approach used 
to make possible the automatic generation of tests with TGV: formal specifica- 
tion and verification of the CC_NUMA cache coherency protocol, formalization 
of the test purposes. The tools developed in order to make executable the gen- 
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erated tests in the real test environment of Bull are presented in section 4. This 
is followed in section 5 by the description of the advantages brought by this ex- 
periment to both of the two communities (network protocol conformance testing 
and hardware testing) and a quantitative and qualitative analysis of the results. 
The conclusion gives some ideas on current and future work. 



2 The Bull’s CCJMUMA machine: architecture and 
testing environment 

2.1 The general architecture and the cache coherency protocol 

The Bull’s CC_NUMA architecture is a multiprocessor system based on a 
Cache-Coherent Non Uniform Memory Architecture (CC-NUMA). It is derived 
from Stanford’s DASH multiprocessor machine and consists of a scalable inter- 
connection of up to 8 modules (see figure 1). 




Fig. 1. The Bull’s CC_NUMA General Architecture 



The memory is distributed among different modules. Each module contains a 
set of up to 4 processors. The key feature of the Bull’s CC_NUMA architecture 
is its distributed directory based cache coherency protocol using a Presence Cache 
and a Remote Cache in each module. The Presence Cache of a module is a 
cached directory that maps all the blocks cached outside the module. The global 
performance of the Bull’s CC_NUMA architecture is improved through the 
Remote Cache (RC) that locally stores the most recently used blocks retrieved 
from remote memories. Remote memory block can be in one of the following 
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status: uncached, shared, modified which correspond to the possible RC status: 
(INV)alid, (SH)ared, (MOD)ified. 

Thus, testing the Cache Coherency Protocol comes to verifying that the status 
of the Presence Cache and Remote Cache are always correctly updated during 
the execution of any transaction in the Bull’s CC_NUMA architecture. 



2.2 The test purposes 

The experts of Bull including the designers of the Bull’s CC_NUMA architec- 
ture (who know its weak points and important properties to test) have written 
a document called the “test plan document” . It contains informal description 
(in the shape of tables with comments) of the main test purposes to be applied 
to the Bull’s CC_NUMA architecture. Seven Test Croups have been identi- 
fied. In our experiment, we were interested in two Test Groups (Group 3 and 
4) concerning the test of the Cache Coherency protocol. An example of test 
purpose describing an address collision situation is: “The Module#! requests 
for a FLUSH transaction on the block address AO. The block address AO is in 
Module#0. Verify that the Module#0 accepts the incoming FLUSH transaction. 
The CPU#0 of Module# 0 executes a RWLTM on the same address. Check the 
immediate address collision on block AO. Check also that the correct response is 
given by Module#0 and verify the good completion of the FLUSH transaction.’’^ 



2.3 The current testing architecture 

The testing environment of Bull’s CC_NUMA architecture used in this ex- 
periment is called SIMl environment and is described in Figure 2. It consists 
of 3 modules, connected on a Remote Interconnection Network. Each module is 
composed by Processor Behavioral Models (MPB Bus Model), Memory Array 
and Memory Controller, Arbiter and I/O Block, Coherency Controller, Remote 
Cache Tag that contains the Tag of Remote Cache and the Presence Cache. The 
simulation environment is composed of the description of the system (the 3 mod- 
ules), the kernel event simulator (VSS kernel: VHDL Synopsys Simulator) and a 
front end human interface (VHDL Debugger) . The MPBgen application converts 
the MPB input commands format (input files) into the expected intermediate 
format (input tables) readable by the MPBs. The probe VHDL module is then 
in charge of down-loading the desired (among the observed) output events; the 
VSS writes them in a file (PROBE. OUT file). 

From testing point of view, the system under test (called SUT on the figure 2) 
is seen as a black box. Thus, testing in this environment consists in specifying 
the input files and analyzing the probe output files. 

The input files: There is one input file per MPB and an input file describes a 
sequence of transactions to be executed by one CPU. The input files are currently 
written by hand according to the informal test purposes specified in the test plan 
document. The main difficulty in describing these files is the synchronization of 
the CPUs. The synchronization of all the transactions which are to be executed 
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System Under Test (SUT) 




Fig. 2. The usual SIMl testing environment 



by one CPU, is achieved by using a “barrier” (the SYNC_CYC transaction) 
for any subsequent operation issued by the same CPU. In the situation where 
transactions are executed by several CPUs, the way to achieve synchronization 
between these CPUs consists in using an operation in which one specifies a 
delay S after which each transaction (one input file to its corresponding CPU) 
will be executed by turns on the bus. The main difficulty of this synchronization 
mechanism is the estimation of S which is currently done empirically. 

The probe output files: A probe output file is generated at each clock cycle if 
significant events happen (see Figure 2). It contains for each module the sequence 
of actions which has been effectively executed in the system together with the 
Presence Cache and Remote Cache status. One line of this file describes one 
action with a stamp corresponding to the starting time of its execution and has 
the following form: 

PROBE #0 > L_Bus 620 burst rwitm AO Tag 00 addr=014000AA00 

Pos_Ack Resp_Rerun at time 660 NS 

This line means that the probe of Module#0 observes at time 660 NS a RWITM 
transaction on the local bus 620. 

2.4 The current testing methodology 

Currently, the input tables are written “by hand” and the analysis of the output 
file is also done “by hand” using some empirical rules. It consists in comparing 
each line of the probe file with what was specified in the test purpose which is 
informally described in the test plan document. The main problem here is the 
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analysis task which is completely based on informal specifications and informal 
notion of conformance. This implies the problem of the correctness of these tests, 
and therefore the problem of the confidence to put in the associated verdicts. 
The approach using TGV brings a solution to this problem since all the objects 
(specification, test purposes,...) are formally specified. 



3 Automatic tests generation with TGV 

The prototype TGV we have developed in the Pampa team at Inria-Rennes in 
collaboration with the Spectre team of the Verimag laboratory at Grenoble is 
dedicated to automatic generation of conformance tests for protocols based on 
their formal specification. Given a formal specification of the system to be tested 
and a formal description of a test purpose (which represents an abstract form 
of the property to be tested), TGV generates an abstract test case. It is a direct 
acyclic graph in which each path represents a test sequence with associated 
verdict which indicates whether the implementation under test (lUT) conforms 
with the specification or not. Details on TGV algorithms can be found in [5,6,7]. 
We present here only the elements (described in Figure 3) which participate in 
the generation of a test case for the Bull’s CC_NUMA architecture. 



Formal specification 
CC_NUMA_spec.lotos 



caesar.adt 
caesar -open 



CC_NUMA_spec.c 

CC_NUMA_spec.h 



j C ^om^^ler | 
CC_pJuMA_spec.o 



TGV Libraries 

FERMDET_OPEN: abstraction an^ 
determinization 

TGV_OPEN; traversal of the 
synchronous product and 
test case synthezis 



Link 



Specialization parameters 
CC_NUMA_spec.renamel 
CC_NUMA_spec.hide 









CC NUMA 



> 



Test purpose 

txx_obj .inform 
Context + behaviour 



txx_obj.aut 



txx_test.aut 



Fig. 3. TGV General Architecture using LOTOS entry 



The first main entry of TGV is the formal specification of the system. The 
CAESAR.ADT compiler of the CADP toolbox [8] is used to compile the data part 
of the specification. The GAESAR compiler produces the G file corresponding to 
the control parf including the functions (Init, Fireable, Gompare,. . . ) needed by 
TGV to manipulate “on-the-fiy” the state graph of the system (without gen- 
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erating it) [5]. Then, the C compiler produces the corresponding object-file 
(CC_NUMA_spec.o in Figure 3). 

Depending on the properties to be tested, some observable interactions de- 
scribed in the LOTOS specification can be judged not important for the testing 
activity. Those interactions must be considered unobservable. This is done in 
TGV by a hiding mechanism (CCJMUMA_spec.hide in Figure 3) which contains 
all the interactions to be considered internal to the system. The semantics of 
LOTOS (so do the CAESAR compiler) does not make distinction between input 
and output because interactions between processes are synchronization events. 
But, TGV needs to distinguish controllable events (from tester to implementa- 
tion) from observable events (from implementation to tester) in the generated 
test cases. We introduce in TGV a renaming mechanism to resolve this problem. 

The other main entry of TGV is the formal test purpose from which we have 
to generate a test case. It is formalized (see an example of formalization in 
section 3.2) by an automaton in Aldebaran format. The libraries fermdet_OPEN 
and TGV_OPEN contain the functions which realize “on-the-fly” all the operations 
(abstraction, reduction, determinization and test case synthesizing) leading to 
the generation of the test case. This is a solution to the combinational explosion 
problem which makes most of tools unable to generate test cases for complex 
systems. Linking the object file together with the two libraries (fermdet_OPEN 
and tgv_OPEN), produces an executable (tgv_CCJMUMA in Figure 3). 

Given a formal test purpose (txx.obj.aut) and the specialization files (de- 
scribed with two files CCJMUMA_spec. rename and CCJMUMA_spec.hide) as pa- 
rameters of this executable, tgv generates the corresponding test case in form of 
a “decorated” DAG (Direct Acyclic Graph). Each path of this DAG represents 
a test sequence. 



3.1 Formal specification of the cache coherency protocol 

The formal specification is composed of 3 modules and consists of about 2000 
Lotos lines where one half describes the control part (13 processes) and the other 
half defines the ADT (Abstract Data Types) part. As this formal specification 
is considered by tgv as the reference model of the system, it has to be strictly 
debugged and verified. This has been done with appropriate formal verification 
techniques [8]. In the following, the 3 modules are called MO, Ml and M2. Each 
module contains one processor called PO. There are two block addresses in the 
system called AO and Al, and two data DO and Dl. These blocks are physically 
located in module MO. Two main reasons bring us to make some abstractions: 

• The first reason is due to the size and the complexity of the Bull’s 
CC_NUMA architecture, with as direct consequence the state explosion problem 
even though tgv works “on-the-fly” . Thus, some causally dependent operations 
concerning the same transaction are collapsed. In a remote transfer for example, 
an event from the sending agent is followed by an event for the receiving agent. 
In order to reduce the complexity, these two transactions are collapsed in one 
event and modeled in the Lotos specification by a gate. 
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• The second reason is that in this work, we are interested in tests generation 
for the Cache Coherency Protocol. So, we make abstractions needed to hide all 
other operations which do not concern with this protocol. For example the local 
response transaction always follows a local bus transaction in an atomic way 
(although if the real system can do something else between this two actions). 
These two transactions are collapsed in the Lotos specification, as well as all 
events between bus operation and response. 

Notice here that these abstractions do not change the semantics, since during 
the execution we also do appropriate corresponding abstractions on the probe 
output files (see later the TRANSLATOR application in section 4). 



3.2 Formalization of the test purposes 

In TGV, a test purpose describes an abstract view of the test case and it is 
modeled by a labeled automaton in the Aldebaran syntax [8]. The format of a 
transition is: (from_state, label, to_state). A label is a LOTOS gate followed by a 
list of parameters. As an example, we give hereafter (see Figure 4) the automaton 
which formalizes the informal test purpose described in section 2.2. 



des (0,8,7) 

(0,"?BUS_TRANS !M1 IFLUSH !A0 IPROCESSDR !FALSE",1) 

( 1 , "*",!) 

(1,"BUS_TRANS !M0 IFLUSH !A0 !RCC_INQ !FALSE",2) 

(2,"?BUS_TRANS !M0 IRWITM !A0 IPROCESSDR IFALSE",3) 

(3,"LMD_GET IMO IDUTQIO lAO lAO IRCC_INV IFLAG(FALSE, FALSE) I BCK_CDLL" ,4) 
(4,"LDC_RESP IMO I ARESP_RETRY" ,5) 

(5,"*", 5) 

(5,"PACKET_TRANSFER IMO I Ml I RESP_PACKET_TYPE INIL_DATA I NETRESP_DDNE 

IDUTQIO", 6) 



ACCEPT 6 



Fig. 4. An example of formalized test purpose 



As said before, tgv needs to distinguish between input and output actions 
of the system. This is achieved simply by the first occurrence of “?” (for input) 
or “!” (for output) in the label. One can easily recognize the transitions corre- 
sponding to the actions described in the informal test purpose. For example, the 
first transition indicates that the Module#! requests for a FLUSH transaction 
on the block address AO. 

The statement ACCEPT 6 indicates to tgv that the state 6 is the acceptance 
state of the test purpose. When the Module MO sends a response (noted NE- 
TRESPJDONE) to Module#! which notifies the good completion of the trans- 
action, TGV should consider that the test purpose is reached. This is mentioned 
in the test purpose with the last transition. The label stands for otherwise. 
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With the transition TGV takes other intermediate observations into 

account until it observes the specified observations (from state 1). 

We do not describe here the complete test purpose including the refusal state 
which indicates to TGV to not consider the labels of transitions which lead to that 
state while generating the test case. In our example, there are 36 transitions like 
this which are repeated for all the 5 non-final states of the test purpose. This 
can seem complicated but we have developed a software which automatically 
generates these transitions. 



3.3 Generated abstract test cases 

We are not going to describe all the generated test case (it contains 20 transitions 
and 20 states) starting from the formal test purpose described in section 3.2 cor- 
responding to the informal one of section 2.2. Notice that most of the generated 
test cases contain more than 400 states and transitions. Due to their complexity, 
such test cases are difficult to obtain by hand even by experts. We give here and 
comment some of the first significant lines of the test case: 

des (0, 20, 20) 

(0," !BUS_TRANS !M1 ! FLUSH !A0 ! PROCESSOR ! FALSE", 1) 

(1,"RCT_GET ?M1 IQUTQIO !A0 !A0 !RCC_INV !N0_CDLL",2) 

(2,"LDC_RESP ?M1 ! ARESP_NULL" , 3) 

(3,"PACKET_TRANSFER ?M1 !M0 ! FLUSH !A0 ! REQ_PACKET_TYPE !NIL_DATA 

!NETRESP_NIL IQUTQIO !M1 lOUTQIO ICO", 4) 
(4,"LMD_GET ?M0 lINQIO !A0 !A0 !RCC_INV !FLAG(FALSE, FALSE) !N0_CDLL",5) 
(5,"BUS_TRANS ?M0 ! FLUSH !A0 !RCC_INQ ! FALSE", 6) 

(6,"LDC_RESP ?M0 ! ARESP_RETRY" ,7) 

(7," !BUS_TRANS !M0 IRWITM !A0 IPROCESSDR !FALSE",8) 

(8,"LMD_GET ?M0 IQUTQIO lAO lAO IRCC_INV IFLAG(FALSE, FALSE) I BCK_CDLL" , 9) 
(9,"LDC_RESP ?M0 I ARESP_RETRY" , 10) 

(10,"BUS_TRANS ?M0 IFLUSH I AO IRCC_INQ IFALSE",11) 

(11,"L0C_RESP ?M0 IARESP_NULL" ,12) 

(12,"PACKET_TRANSFER ?M0 I Ml I RESP_PACKET_TYPE INIL_DATA I NETRESP_D0NE 

IQUTQIO, (PASS)", 13) 



In addition to other intermediate actions generated by TGV, one can recognize 
the reverse form (output becomes input) of actions described in the formal test 
purpose. Thus, the first transition is the first stimuli of the tester and consists of a 
FLUSH transaction requested by module Ml. This is expected to be a remote op- 
eration as the target of this transaction is the address location AO (local to MO). 
The transition (5,”BUS.TRANS ?M0 IFLUSH !A0 IRCC.INQ !FALSE”,6) indi- 
cates that the FLUSH operation is correctly arrived on Module^O and has been 
run on local bus of Module#0. So, at that point every local operations of Mod- 
ule#0 on the same address AO (in the example: (7,” !BUS_TRANS !M0 IRWITM 
lAO IPROCESSOR IFALSE”,8)) leads to a block collision: (8,”LMD.GET ?M0 
lOUTQIO I AO I AO IRCCJNV IFLAG (FALSE, FALSE) IBGK.GOLL”,9). At 
that point, in conformity with the specification, the local operation is retried 
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until the remote operation has completely accomplished: (9,”LOC_RESP ?M0 
!ARESP_RETRY”,10). The remote operation on module Module^O ends with 
a DONE remote response on remote link: (18, ’’PACKET .TRANSFER ?M0 !M1 
!RESP JACKET .TYPE INIL.DATA INETRESP J30NE lOUTQIO, (PASS)”, 19). 
The other transitions of this test case correspond to other orders of execution of 
the operations previously described. 



4 Making the generated test cases executable in the 
SIMl environment 

An abstract test case generated by TGV is a direct acyclic graph in which each 
branch describes a sequence of interactions between the tester and the system 
under test. This way of generating test cases is suitable to network protocols 
conformance testing where the testing activity is “interactive”. Even though 
some tests would better be executed in an interactive way, we have seen (see 
section 2.3) that the usual testing activity in SIMl environment is rather off-line 
(“batch”) as it consists in 3 independent steps: (a) stimulating the system, (b) 
collecting all the observations, (c) analyzing and emitting a verdict. 

Our first deal was to demonstrate that this usual manual testing approach can 
be done automatically. So, we have implemented a batch testing environment for 
the execution of interactive abstract tests. Figure 5 shows the overall structure 
of the tester package we have developed. It consists of three applications called 
EXCITATOR, TRANSLATOR and ANALYSOR. A complete example of how 
these applications fit together to execute batch tests is described in [9]. Notice 
that the main difference with the interactive approach described hereafter is 
that in the batch approach, the launching by turns of the 3 applications (first 
EXCITATOR, second TRANSLATOR and third ANALYSOR as indicated in 
figure 5) is done by hand and once for each test case. Moreover some tests 
(which needs interactivity) cannot be efficiently executed. After this, we have 
proposed and implemented also an interactive testing environment in order to 
keep the gain brought by the interactive nature of the tests generated by TGV. 



4.1 The interactive version of the tester package: an example 

Let us now consider the informal test purpose corresponding to an address col- 
lision situation described in section 2.2. In this case, the test case is clearly 
interactive because after the FLUSH transaction (requested by Module#!), the 
RWITM operation (requested by Module#0) can be initiated only after the ob- 
servation of the FLUSH operation on the local bus of Module#0, which means 
that the operation is accepted by the Presence Cache of Module#0. 

We are not going to describe all the steps of the interactive execution of this 
test case. The most important point here is to show through some significant 
steps how the 3 applications (EXCITATOR, TRANSLATOR and ANALYSOR) 
fit together for interactive testing. 
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System Under Test (SUT) 




Fig. 5. The Bull’s CC_NUMA SIMl testing environment using TGV tests 



Both EXCITATOR and TRANSLATOR take into account some Implemen- 
tation extra Informations for Testing (called IXIT_FILE_K in Figure 5). These 
informations describe the mapping between the abstract data values of the for- 
mal specification and the real data values of the system under test. 

The EXCITATOR application deals with the conversion of stimuli included 
in the test case (called TEST_CASE_X.AUT in Figure 5) described in the Alde- 
baran format of TGV into a format readable by the MPBs. Once the conversion 
is done, the EXGITATOR proceeds to the stimulation of the MPBs. At the ini- 
tial clock cycle of the simulation, EXGITATOR is invoked to extract the first 
stimulus from the test case (the FLUSH requested by Module#0) and proceeds 
to the stimulation of the MPBs. Then, the VSS kernel generates the probe out- 
put line given below (called PROBE_OUT_X in Figure 5). This line describes 
the requested transaction effectively observed from the system under test. 

****** Launching the simulation: 

# run 10000 

***** first excitator action: START! I 

* MPB/620 Bus, Behavioral Model PseudoCompiler * 

* Jan 29, 1998 * 

End of run 

command detected MPB num. : 8 command : word(h0000 , [] ,z000FFC020000 

000000000001000000000000000000000000 , []) . 

PROBE # 1 — > L_Bus 620 16 byte flush AO Tag 00 addr=0000000000 

Pos_Ack Resp_Null at time 580 NS 
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The TRANSLATOR application is then in charge of translating the probe 
outputs into a trace in the specification model. This translation is necessary 
to make possible the analysis of the observation according to what has been 
foreseen in the specification. At the next clock cycle, TRANSLATOR converts 
the probe output line into the Aldebaran format using appropriate abstractions 
corresponding to those done in the formal specification (see section 3.1). The 
result is submitted to the ANALYSOR application. 

The obtained trace is analyzed by ANALYSOR according to the test case 
(called TEST_CASE_X.AUT) generated by TGV (see Section 3.3). Hereafter is 
the output of the ANALYSOR which describes the part of the test case which 
has been traversed. 



ANALYSER phase... 

TC traversed part . . . 

(0," BUS_TRANS !M1 ! FLUSH !A0 ! PROCESSOR ! FALSE", 2) 

(2,"L0C_RESP !M1 ! ARESP_NULL" ,3) 

done 



The probe output file generated at each next clock cycle, is converted and 
analyzed until one of the following conditions handles in the test case traversed 
synchronously by ANALYSOR: 

— a verdict is found (an end of the test case is reached): it is then emitted, 
the following probe lines generated by the VSS kernel are ignored, and the 
simulation is interrupted, 

— another stimulus is found: the EXCITATOR is then invoked to submit it to 
the VSS kernel, and a new turn of test cycle begins (simulation continues), 

— no corresponding transition is found: a verdict FAIL is emitted indicating 
that the implementation doesn’t conform to the specification (w.r.t. to the 
corresponding test purpose). 

Let us jump now to the step of the simulation where the second stimulus 
(RWITM) is detected by the EXCITATOR. The collision in the home module 
(Module#0) is effectively obtained: 

EXCITATOR ACTION.. 

— Input file for Proc. 0 
RWITM WT ADDR=0000000000 TARGET=0 

command detected MPB num. : 0 command : word(h0000 , [] ,z000FFC360000000 

000000001000000000000000000000000 , []) . 



RCC 


# 0 — > 


out_q_behavior . vhd: L_Respout_Retry for 


Collision 


detected 






dbg_info= 000 


at time 


2240 NS 


RCC 


# 0 — > 


out_q_entry . vhd: freeing 0UT_Q # 02 










dbg_info= 000 


at time 


2260 NS 



PROBE # 0 > BLINK SID=2 fm 0000 to part 0010 R_tag=05 R_Done 

at time 2300 NS 

PROBE # 0 > L_Bus 620 burst rwitm 80 Tag 00 addr=0000000000 

Pos_Ack Resp_Retry at time 2320 NS 
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At the last analysis phase, a PASS verdict is detected by the ANALYSOR 
(see below). This verdict states that the behavior of the lUT is conform to 
the specification w.r.t the test purpose. It means that the FLUSH operation 
terminates correctly, passing all the check-points despite the colliding RWITM 
transaction. 



ANALYSER phase . . . 

TC traversed part . . . 

(9,"PACKET_TRANSFER !M0 !M1 ! RESP_PACKET_TYPE !NIL_DATA ! NETRESP_DQNE 
lOUTQIO, (PASS)",1) 

=>IUT(0) ,TC(9) : ***** PASS ... 

End of Test Case 

The main difference with the batch testing is that all this steps are chained 
up automatically using the clock cycles and as many times as possible until the 
end of the test case. This allows the execution of tests in which more than one 
stimulus are necessary and the next stimulus depends on the reactions of the 
system observed after the previous stimulus, as the case with tests generated by 
TGV. By the way, this approach also increases the test coverage. 

5 Results of the experiment and analysis 

Through the different steps of this experiment described in the following, we 
indicate how we have resolved the different problems encountered, how much 
does it cost, what are its significant results, etc. 

Formal specification The first work was to obtain a formal specification of the 
Bull’s CC_NUMA architecture as suitable as possible for describing hardware 
and for test generation using TGV. The justifications of the choice of LOTOS 
language are given in section 3.1. In fact, good abstractions were also done 
in order to avoid needless complicated aspects of system in the specification 
(see section 3.1). As this specification is considered as a reference by TGV, it 
was important to guarantee that it is error-free. This work was done by Bull 
and took about 8 man x months to have the first version. Modifications were 
done until the end of the experiment. Starting from the formal specification 
used for verification it took 1 man x month to adapt it for test generation 
purpose. By the way, notice that some bugs have been detected during this 
formal specification. 

Improvements of TGV The first version of TGV (before this experimenta- 
tion) accepts only specifications in SDL or Aldebaran language. Because LO- 
TOS language have been chosen for the specification, we were obliged to make 
TGV taking into account specifications described in this language. Different 
problems and corresponding solutions developed are explained in section 3. 
Other improvements of TGV dedicated to refine the generated test cases were 
needed and implemented during the experiment such as: 

— the introduction of refusal states in the test purposes which reduce the 
part of the specification traversed. 
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— the generation of loops in the test cases (this was not the case before 
this experiment) leading to fewer Inconclusive verdicts; this allows the 
test of more functionalities. 

These works were done by Inria-Rennes and costs about 8 man x months. 
The main benefit is that this experiment is the first one showing the interest 
of the on-the-fly generation available in TGV. In fact, it was impossible to 
obtain the state graph of the Bull’s CC_numa specification. So, the only way 
to obtain tests is to work on-the-fly. 

Abstract test cases generation We have formally specified all the test pur- 
poses described in the Test Groups 3 and 4 (see section 2.2) including 
those requiring an interactive behavior of the system. This work costs about 
15 man X days. For each test purpose, we have generated the corresponding 
abstract test case using TGV. He who can do more can do less, we have also 
generated test cases for some basic operations. A total of 75 tests have been 
generated and cost 1 man x month. The main problem here concerns with 
the time cost of the test generation with TGV: from less than 1 second for 
some test to about 12 hours for others. This is due to the complexity of the 
Bull’s CC_NUMA architecture specification which required us sometimes 
to refine the test purposes in order to speed up the test generation with TGV. 

Developing the tester package The main difficulty in executing the test cases 
was in the fact that the format of the test cases is different from the probe 
output format. It costs about 5 manx months to Inria-Rennes to develop the 
tester package which brings solution to this problem. 

Since the applications which constitute this tester package are generic and 
automatically produced using classical compiler generators, they can be 
reused to test other systems without major effort. 

Using the tester package All the test cases generated by TGV have been ex- 
ecuted in the testing SIMl environment using the tester package. For each 
test case and the corresponding probe output file, no sensible overhead is 
charged to the simulation time due to the presence of the tester package. 
An estimation of maximal time spent to execute all the 75 tests is less than 
20 hours (1 day full time basis) corresponding to 1000 cycles per test, 0.6 
second per cycle, 5 minutes for environment loading. 

Results and analysis The main benefit in using the TGV approach is that we 
only have to formally specify the system to test and the test purposes, then 
all the testing activity would be completely automated. The time spent in 
specifying the Bull’s CC_NUMA architecture, formalizing test purposes 
and generating the test cases with TGV is completely paid by the better 
correctness and the confidence to put in the implementation. This approach 
permitted to detect 5 bugs concerning principally the address collision, and 
problems of test coverage (some situations were not tested): the update of 
the Presence Cache and Remote Cache directory sometimes are not done in 
the same order as described in the specification. 
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6 Conclusion 



In this paper, we have presented an end-to-end industrial experiment which 
demonstrates that the prototype TGV which was developed for conformance 
testing of communication protocols can also be efficiently used to test hardware 
architectures. In fact and this is the main result of this experiment, the approach 
have permitted to improve the quality of the tests and the test coverage: we have 
detected bugs which were not detected manually by experts of hardware testing, 
using interactive approach. It brings also some significant improvements in both 
of the conformance test generation with TGV at Inria-Rennes and off-line testing 
in hardware at Bull: this approach will be used for another architecture under 
construction at Bull. 

Now, we are on the way to improve again our test coverage using more general 
test purposes and living TGV to decide the actions to do on the system to cover 
a particular situation. 
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Abstract. We present a deductive verification framework that combines 
deductive reasoning, general purpose decision procedures, and domain- 
specific reasoning. We address the integration of formal as well as in- 
formal domain-specific reasoning, which is encapsulated in the form of 
user-defined inference rules. To demonstrate our approach, we describe 
the verification of a SRT divider where a transistor-level implementation 
with timing is shown to be a refinement of its high-level specification. 



1 Introduction 

Most formal verification of hardware designs is based on state-space exploration 
or theorem proving. State space exploration provides an automatic approach for 
verifying properties of designs described by relatively small models. In principle, 
theorem proving techniques can be applied to much larger and more detailed 
design descriptions. However, the large demands for the time of expert users 
prevents the wide-scale application of theorem proving techniques. 

The strengths and weaknesses of state-space exploration and theorem prov- 
ing are in many ways complementary. This has motivated several recent ef- 
forts to combine the two techniques [5]. One approach is to embed state-space 
exploration algorithms as decision procedures in a general purpose theorem 
prover [20] . In this approach, the design and specification are represented by for- 
mulas in the logic of the prover, and decision procedures are oracles, introducing 
new theorems into the system. Alternatively, some researchers have augmented 
state-space exploration tools with simple theorem proving capability [12,1,18]. 

Viewing the verification task as one of maximizing the probability of produc- 
ing a correct design subject to schedule and budget constraints, we generalize the 
latter approach. Using domain- specific and possibly informal decision procedures 
and inference rules in a deductive framework, we can verify critical properties 
of real designs that would not be practical to verify by theorem proving and/or 
model checking alone. Section 2 elaborates this claim. Section 3 describes our 
implementation of this framework, and section 4 presents our verification of a 
self-timed divider using this tool. 
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1.1 Running Example: Asynchronous Divider Verification 

Our divider verification establishes refinement between progressively more de- 
tailed descriptions of the design written in the Synchronized Transitions lan- 
guage [25]. The highest level model is an abstract specification of radix-2 SRT 
division on rational numbers; we prove functional correctness of the algorithm 
at this level. The most detailed model formalizes the transistor-level structure 
along with its timing properties. Each level of the hierarchy inherits the safety 
properties of the higher levels: by showing that the top-level model divides cor- 
rectly, we establish that all of the lower level models divide correctly as well^. 
Although there have been many published verifications of dividers, we believe 
that our work is distinguished by spanning the complete design hierarchy. 

1.2 Synchronized Transitions 

A Synchronized Transitions (abbr. ST) [25] program is an initial state predicate 
and a collection of transitions. A transition is a guarded command. For example, 

« X > y ^ X, y := y, x » 

is a transition that is enabled to swap x and y when x is greater than y. Tran- 
sitions may be combined using the asynchronous combinator, ||, for example 
ti 11^2 II ■ • ■ \\tn- Program execution consists of repeatedly selecting a transition, 
testing its guard, and, if the guard is satisfied, performing the multi-assignment. 
The order in which transitions are selected is unspecified: this non-determinism 
models arbitrary delays in a speed-independent model. ST provides other com- 
binators and other language features which are not presented in this paper. 

1.3 Semantics 

We employ a wp semantics (see [8]) for ST. If P is a program and Q is a pred- 
icate, then wp{P,Q) is the weakest condition that must hold such that Q is 
guaranteed to hold after any single action allowed by P is performed. Consider a 
transition the guard, G, denotes a function from program states to 

the Booleans; the multi-assignment, M, denotes a function from states to states. 
A wp semantics of ST includes 

wp{<^G^M'», Q) = G ^ Q o M 

n 

Wp{ti\\t2\\ ■ ■ ■ \\tn,Q) = f\ Wp{ti,Q) 

i=l 

We make extensive use of invariants. A predicate / is an invariant if I holding 
in some state ensures that I will hold in all possible subsequent states of the 
program. In particular, I is an invariant oi P iff I ^ wp{P,I). A predicate Q is 

^ A detailed description of the refinement proofs between an intermediate and the 
transistor-level models can be found in [15]. 
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a safety property of P if Q holds in all states reachable in any execution of P. 
As shown in [13], Q is a safety property of P if and only if there is an invariant 

1 such that Qo ^ I and I ^ Q. 

Intuitively, program P' is a refinement of P if every reachable state transition 
that P' can make corresponds to a move of P. More formally, refinement is 
defined with respect to an abstraction mapping A that maps the states of P' 
to P[2j. P' is a refinement of P under abstraction mapping A iff for every 
reachable state s[ of P' and every state s '2 that is reachable by performing a 
single transition of P' from either A(s^) = A(s 2 ) (a stuttering action), or 
there is a transiton of P that effects a move from A(s)) to A^s'^). 

2 Verification Approach 

Like many theorem provers, our verification tool presents a deductive style of 
verification. However, there are three ways in which our approach differs from 
traditional theorem proving: 

Integration of informal reasoning. Domain-specific decision procedures and 
inference rules can be used in our framework. Such procedures provide an 
algorithmic encapsulation of formal or informal domain expertise; this allows 
domain expertise to be introduced as a hypothesis of a proof. 

Syntactic embedding of the HDL. Our framework favours an embedding of 
the hardware description language (HDL) at a syntactic level. Inference rules 
operate directly on the HDL’s abstract syntax. 

Merging of inference rules and decision procedures. In traditional theo- 
rem provers, inference rules provide pattern-based rewriting of proof obliga- 
tions, while decision procedures (if any) decide the validity of leaf obligations 
in a proof tree. In our framework, inference rules may perform non-trivial 
computations to decide the soundness of a proof step, or to derive the result 
of an inference step. 



2.1 Informal reasoning in formal verification 

At first, the suggestion of allowing informal reasoning to be introduced into a 
formal proof appears to be outrageous: if an informal inference rule is unsound, it 
can invalidate any proof in which the rule is used. However, informal rules provide 
a practical way to tailor our verification tool to specific domains and verify 
properties that would not be practical to address by strictly formal approaches. 
When errors are found in a design, the verification effort is worthwhile even if 
some steps are justified only informally. 

Informal reasoning is commonplace in many verification efforts. For example, 
model-checking is typically applied to an abstraction of the design that was 
produced informally by a verification expert [11,19]. Although the absence of 
errors in the abstraction does not guarantee the correctness of the actual design, 
errors found in the abstraction can reveal errors in the actual design. Many 




A Light-Weight Framework for Hardware Verification 333 



theorem-prover based verifications model functional units at the register transfer 
level; the gate- and transistor-levels of the design are validated only through 
simulation and informal reviews [24] . 

We make two uses of informal rules. First, an informal rule can provide an al- 
gorithmic encoding of domain knowledge where a formalization in logic would be 
unacceptably time-consuming. For example, the timing analysis procedure that 
we used derives a graph whose nodes correspond to the channel connected re- 
gions of the transistor-level circuit. The circuit topology is syntactically encoded 
in the text of the ST program, and the procedure derives timing bounds through 
graph traversal. The correspondence between the graph and the original circuit 
and the soundness of the graph traversal have only been shown informally. 

Second, we use several ‘semi-formal’ rules for reasoning about ST programs. 
For instance, the proof rules for reasoning about invariants, safety properties, 
and refinements are founded on theorems that were formally proven (although 
the proofs have not been mechanically checked) . These theorems are based on a 
formal semantics of a core language only, and their extension to the full language 
with records, arrays, functions, and modules is informal. 

In our framework, informal inference rules and decision procedures can be 
seen as a generalization of the concept of using a hypothesis in a proof: Usually, 
a hypothesis is simply a formula that is assumed to be valid. An informal rule in 
contrast is an algorithm of which it is assumed that it permits only sound infer- 
ences (e.g. by generating a valid formula and introducing it as an assumption). 

2.2 Syntactic embedding of the HDL 

Formal verification requires a description of the design as a formula in the ap- 
propriate logic. If it is not practical to describe the design directly in logic [9], 
e.g. because of lack of tool support for simulation, synthesis etc, an embedding of 
the HDL in the logic has to be devised. Such embeddings are commonly divided 
into two classes [6] : In a deep embedding, both the (abstract) syntax of the HDL 
as well as its semantic interpretation are defined within the logic in terms of an 
abstract data type and a semantic function, respectively. This provides a very 
rigorous embedding and allows meta-reasoning about the HDL semantics. How- 
ever, the effort for producing such an embedding can be substantial, although it 
may be possible to amortize this effort over many designs. 

In a shallow embedding in contrast, the semantic interpretation of the HDL 
occurs outside the logic. Shallow embeddings can be easier to implement than 
deep embeddings because the translation process is informal with a correspond- 
ing loss of rigour. Because program structures are not represented in the logic, 
theorems that refer to the syntactic structure of the HDL description can be 
neither stated nor proven [6]. 

We propose a third variant, a syntactic embedding: The syntax of the HDL 
becomes part of the syntax of the logic (see section 3.3 for the embedding of ST). 
As in a shallow embedding, the semantic interpretation is informal. However, the 
procedures that perform this interpretation are encapsulated as domain-specific 
inference rules. This provides a tighter integration with the prover than could 
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be achieved with a shallow embedding. However, as with shallow embeddings, 
no meta-reasoning about the semantics of the specification language is possible. 

We have found that a syntactic embedding simplifies the implementation of 
semi-formal or informal inference rules. Such rules are often based on syntactic 
analysis of the underlying program. These rules are easier to implement, and 
hopefully less prone to implementation errors, because the abstract syntax of 
the program is immediately available in the syntactic embedding. 

2.3 Merging of Decision Procedures and Inference Rules 

Traditional mechanized theorem provers generally use only decision procedures 
in the classic sense of an algorithm that decides the validity of a formula. Such 
decision procedures are used to discharge proof obligations in a single automatic 
step, i.e. they operate on the leaves of a proof tree. Proof steps interior to the 
proof tree, however, are generally justified by matching them with an inference 
rule schema, and possibly checking side conditions or provisos. 

We remove the restriction of decision procedures to leaf obligations and al- 
low inference rules to use arbitrary algorithms to decide the soundness of a proof 
step. Theoretically, lifting this restriction has no significance; such an “inference 
procedure” can be replaced by the corresponding leaf decision procedures, and 
inferences using propositional logic. However, there are significant practical ad- 
vantages to our approach. In many cases, it is convenient to let the inference 
rule compute the derived obligations rather than requiring the user to provide 
them. Of course, one could perform two computations of the derived obligation: 
one outside of the trusted core to derive the result for the user, and the other in 
the core to verify the result. Such an approach has obvious disadvantages with 
respect to efficiency and software maintenance. These problems would be par- 
ticularly severe in a framework such as ours where ease of adding and extending 
domain-specific inference rules and decision procedures is important. Our “in- 
ference procedures” provide a simple mechanism for avoiding these problems. 

3 Prototype Implementation 

We have implemented a proof-of-concept verification environment for our ap- 
proach. It has three architectural components. A generic core provides proof 
state and theorem objects, as well as a tactic interface. The second component 
is a library of common decision procedures, while the third comprises the code 
that is specific to a particular object logic. The system has been implemented 
in Standard ML of New Jersey [4], which also forms the user-interface for the 
proof checker. 

3.1 Generic Core 

Similar to theorem proving environments such as HOL, PVS or Isabelle [10,16,17], 
a (backwards-style) proof in our proof checker is represented by a sequence of 
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proof states. A proof state consists of the claim, the pending obligations, and 
some bookkeeping information. The claim and obligations are judgments which 
can be, for instance, a sequent (in a sequent calculus), or a formula (in a natural 
deduction style calculus). In the initial proof state of a proof, the list of pend- 
ing obligations consists only of the claim. Rules of inference are implemented as 
functions from proof state to proof state, and are used to transform one or more 
pending obligations into zero or more (simpler) obligations. The available proof 
rules are registered with the claim state and cannot be modified afterwards; 
in a sense, they become hypotheses of the theorem. This permits user-defined 
domain-specific proof rules to be introduced without modification of the core. 

A proof state with no pending obligations corresponds to a proven claim, i.e. 
a theorem. To allow for theorems to be used in later proofs without having to 
check, and therefore execute, their proof before each use, we provide theorem 
objects, which associate a claim with a proof, i.e. a function that takes the claim 
proof state and returns a proof state with no pending obligations. Theorems 
can only be used in a proof if they were imported into the initial proof state. 
We provide facilities that analyze the dependency between theorems, ensure the 
absence of circularity, check all proofs that a theorem depends on, and generate 
reports. 

All of the above components are parameterized in the syntax of the logic 
and a well-formedness predicate for proof obligations. The parameterization is 
realized through SML functors. 

To facilitate the interactive development of proofs, we provide a simple goal 
package, which maintains a current proof-state to which rules can be applied, 
and allows proof steps to be undone. As indicated above, a proof in our system 
is a SML function from proof states to proof states. We provide a library of 
higher-order functions on proof rules (analogous to tacticals in e.g. HOL or 
Isabelle) which facilitate the construction of proofs from basic proof rules (which 
correspond to HOL tactics). 

3.2 Library of Common Decision Procedures 

This library comprises core routines of several commonly used decision proce- 
dures. The library is independent of a particular object logic; instantiating a 
decision procedure for a logic requires writing a small amount of interface code. 

To support Boolean tautology checking as well as symbolic model checking, 
the library provides an abstract data type for boolean expressions in a canonical 
representation. The underlying implementation of this data type is a state-of-the 
art BDD package [23] that was integrated into the SML/NJ runtime system. The 
interface provides full access to the control aspects of the BDD package, such 
as variable reordering strategies, cache sizes etc. Based on the BDD package, 
we have implemented a package for symbolic manipulation of bit-vectors and 
arithmetic operation thereon. 

Components for arithmetic decision procedures include a package for ar- 
bitrary precision integer and rational arithmetic, polynomials, and a decision 
procedure for linear arithmetic. 
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Based on these procedures, we have implemented a decision procedure that 
discharges arbitrary tautologies composed of linear inequalities with boolean 
connectives. We have not implemented a decision procedure for combinations 
of theories (e.g. [14,22]) as our simple procedures were sufficient for the divider 
proof. All decision procedures include counter-example facilities for non-valid 
formulas. 



3.3 Object Logic for Synchronized Transitions 

We have instantiated the generic core with a logic suitable for reasoning about ST 
programs. The proof system is a sequent calculus for explicitly typed first-order 
logic that is extended with all types, constants and operators of ST, including 
transition- valued expressions. 

Assertions on ST programs, such as invariants, safety properties and refine- 
ment, are formulated in terms of predicates on transition- valued expressions. We 
provide proof rules, such as the wp-based rule for invariants, that allow such obli- 
gations to be reduced to obligations that are purely within quantifier-free logic 
with boolean connectives, arithmetic, //-expressions, and arrays and records un- 
der store and select. 

As an example, consider a proof state that includes the pending obligation: 

Haslnvariant{«i > 0 ^ i:= i — < N ^ i:= i + 1^,0 < i < N) 

This obligation states that the two transitions maintain the given invariant. An 
application of the proof rule for Hasinvariant rewrites this obligation as 

{0 < i < N) ^ wp{<^i > 0 ^ i:= i — <iV^i:=i-|-l5i>,0<i<iV) 

An application of the proof rule for wp, which implements the semantics given 
in section 1.3, yields: 

(0 < i < iV) ^ 

( ((i > 0) (0 < t- 1 < A^)) 

A{{i<N)^{0<i+l< N))) 

This last obligation can be discharged using the decision procedure for linear 
inequalities with boolean connectives. 

Further proof rules include the usual rules for sequent manipulations, rewrites, 
simplification and lifting of //-expressions, quantifier manipulations, and arith- 
metic simplifications. Together with decision procedures for propositional calcu- 
lus and linear arithmetic, these are frequently sufficient to discharge obligations 
arising from assertions about ST programs. More specialized proof rules will 
be explained briefly in the context of the divider verification presented in the 
remainder of the paper. 
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4 Example: Proving a Self-Timed Divider Correct 

We evaluated the proof checker by verifying Williams’ self-timed divider [27], 
which implements the radix-2 SRT algorithm [7]. We reconstructed the design 
from the descriptions in [27] and [28]. A variation of this design is incorporated 
in the HAL SPARC CPU. 

4.1 Description of the Divider 

As shown in figure 1, the divider consists of 
three identical stages, each of which performs 
the computation of a single iterative step of 
the SRT division algorithm, and which pass in- 
termediate results around in a circular fashion. 
Each stage computes a new partial remainder 
(in carry-save representation) and quotient digit, 
based on the result of the previous iteration 
Fig 1 Divider Architecture which it receives from the preceding stage. The 

design is self-timed [21], with signals encoded as 
dual-rail values [26], and implemented in precharged logic [28]. 

The precharge control block sequences the iterative computation. This block 
reads the stage completion signals and regulates the operation of the stages 
through the precharge control signals. In each iteration, three steps of the SRT 
algorithm are computed. 

Governed by the precharge control signals, each stage is in one of three states: 
precharge, evaluate, or hold. The “precharge bar” signal for stage i is pb(i). 
When pb(i) is low, stage i is precharging. Precharging leads to a state where 
every dual-rail signal produced by the stage has the “empty” value. Evaluation 
leads to a state where every signal has a “valid” value. A stage in the holding 
state leaves its outputs unchanged so that its successor can use them to compute 
the next partial remainder and quotient digit. A simple invariant that captures 
this sequencing is central in many of our proofs. 

Williams employed two optimizations to improve the performance of the 
divider. First, he assumed that a stage can precharge faster than its predecessor 
can evaluate. Second, he assumed that the quotient digit of a stage will be the 
last output to change during the evaluation phase. The first optimization allows 
stage i+1 to precharge in parallel with the evaluation phase of stage i. If no 
timing assumptions were made, these operations would have to be performed 
sequentially. The second optimization allows the computation of stage i+1 to 
start as soon as the quotient digit from stage i is output, without any extra 
hardware to check the completion status of the partial remainder. Due to these 
optimizations, verifying the functionality of the divider includes proof obligations 
that require timing analysis. This timing analysis establishes relative orderings 
of events in the operation of the divider and shows that the assumptions on 
which the optimizations are based are indeed correct. 
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4.2 A Refinement Hierarchy for the Divider 

The transistor-level model of the divider 
is too large to permit model checking, and 
too complicated to verify from first princi- 
ples using a theorem prover. Therefore it 
is desirable to prove safety properties on 
a more abstract, higher-level model and 
show that these properties hold in the more 
detailed models. We used a hierarchy of 
models as depicted in figure 2 to verify 
the divider. Arrows indicate verification 
obligations: vertical arrows correspond to 
refinement proofs, horizontal arrows in- 
dicate other properties that either estab- 
lish correctness or assist in the refinement 
proofs. 

The first two refinement steps are data 
refinements. Our top-level model has a sin- 
gle stage which computes a quotient digit 
and the next partial remainder in each 
step. The divisor, dividend, and remain- 
der have rational values. In the first refine- 
ment step, we replace the rational values 
Fig. 2. Verification Hierarchy with integer values, and the next refine- 
ment step replaces these integers with bit-vectors. 

The next two models elaborate upon the self-timed handshaking protocols 
used in the design. The speed-independent model has three divider stages and 
implements a handshaking protocol that does not depend on the timing delays 
of the components. In the timed, word-level model, bounds are given on the ratio 
of precharge time to evaluation time. 

The lowest-level model corresponds directly to our transistor-level implemen- 
tation of the divider chip. Variables in this model are represented using dual-rail 
code. In the higher level models, the remainder word was computed as a sin- 
gle, atomic action. Here, each signal is set independently. In this transistor-level 
model, a stage’s completion status is determined solely by the quotient digit 
output. 

4.3 Functional Correctness 

Figure 3 depicts the ST code of our top-level, synchronous divider model. In 
radix-2 SRT division, each quotient digit can have the value -1, 0 or 1 (see [7]). 
If the current remainder Ri is greater or equal to 0, 1 is a valid quotient digit 
choice. If the remainder is negative, -1 is a valid choice for the next quotient 
digit. If 2|i?i| < divisor, the quotient digit can also be 0. In our synchronous 
model of the divider this overlapping choice for the digit is represented by three 
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currRem : currRemF (* q: R,D: RATIONAL *) = 

BEGIN 

{ DNE| q = -I 7 2* R + D, 
q^O 7 2* R, 
q^ 1 72*R-D } 

END; 



SRTDivide : SRTDivideC (* q: 

BEGIN 

^ 0 < currRem(_R, q, D) 

II <C —D < 2 * currRem(i?, q, D) < D 
II ^ currRem(_R, q, D) < 0 
END; 



R,D: RATIONAL *) = 

^ R, q •.= currRem{R, q, D),l ^ 
^ R, q •.= currRem{R, q, D),0 ^ 

^ R,q '.= currRem(i?, q,D),—l ^ 



Fig. 3. Synchronous Word Level Model 



transitions combined with the asynchronous combinator (see fig. 3). For exam- 
ple, if the current remainder is equal to —0.2 * D, then either the first or the 
second transition may be chosen for the next step. By using non-determinism, we 
avoid cluttering this description with implementation details, and at the same 
time modularize and simplify the proofs. Deterministic quotient digit selection 
is introduced in the synchronous, bit-vector model. 

The following two properties are invariants of the synchronous divider model: 

(t) \R,\ < D 

(m) 2C-D qj2-^ = R,2^-\ 

where Ri is the remainder determined in iteration i. From these two invariants 
and the initial condition that the divisor D and dividend C are normalized to 
satisfy \ < D < 1 and 0 < C < D, we proved that the computed quotient 
asymptotically approaches the true quotient C/D. 

4.4 Refinement Proofs 

This section gives short overviews of the refinement proofs and mentions key 
problems within each proof. It is this chain of refinement proofs which establishes 
that the functional correctness proven on the abstract, synchronous model also 
applies to the transistor-level model. The divider models will be referred to as 
rational divider, integer divider, bit-vector divider, speed-independent divider, 
timed divider and transistor-level divider. 

In our approach, refinement is a safety-property. To establish refinement, we 
must first show that initial states of the lower-level model correspond to legal, 
initial states of the higher-level model. Then, we must show that for each tran- 
sition that can be performed by the lower-level model, there is a corresponding 
transition of the higher-level model, or that it is a stuttering move [2] of the 
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higher-level model. These proof obligations are derived automatically by one of 
the proof rules that encodes the semantics of our logic for ST. 

Because refinement is a safety property, we can assume that if the state of 
the lower-level model before a transition is performed maps to a state of the 
higher-level model, it satisfies any safety properties that have been established 
for higher-level model. This allows us to use safety properties of the higher-level 
model in the proof of refinement. This is very helpful for our proofs: For exam- 
ple, arithmetic properties that are established for the top-level models can be 
used when verifying the other models. Likewise, invariants that are established 
on intermediate level models can be used when verifying lower-level models. Be- 
cause of this, the verification of refinement is often simply a matter of tautology 
checking. 



Refinement between the Rational Divider and the Integer Divider. To 

convert the integer values in the integer divider to the rational valued variables 
in the rational divider one has to simply apply a division by 2^“^. To prove that 
the integer-valued divider is a refinement of the rational-valued one, it needs to 
be shown that overflows do not happen. However, this is implied by the safety 
property \Ri\ * < D * 2*^-^ which the integer divider model inherits from 

the rational divider. 



Refinement between the Integer Divider and the Bit- Vector Divider. 

In the bit- vector divider, carry-save representation is used for the remainder 
value. The abstraction mapping adds the carry and sum words to determine 
the remainder value at the integer level. Furthermore, the next quotient digit is 
computed deterministically in the bit-vector model based on the top four bits of 
the carry-save adder without resolving the carry of the bottom bits. Thus only 
the top four bits need to be resolved in a carry-propagate adder. Figure 4 shows 
the transitions of the quotient selection logic. Depending on the top four bits 
of cpaSum, the output of the four-bit carry-propagate adder, the next quotient 
is either 1, 0 or -1. For the refinement proof it needs to be shown that for each 
quotient digit choice of the bit-vector model, an equivalent choice can be made 
by the higher-level model. 



QSL : QSLC = 

BEGIN 

<?; -'CpaSum{2) A -■ {cpaSum{3) A cpaSum{l) A cpaSum{0)) —> qt := 1 ^ 

II <C cpaSum{2) A cpaSum{l) A cpaSum{0) —^qi:=0 ^ 

II <C {cpaSum{2) A ^{cpaSum{l) A cpaSum(O))) 

V {cpaSum{3) A ^cpaSum{2) A cpaSum[l) A cpaSum{0)) — > qi := —1 ^ 

END; 



Fig. 4. Quotient Selection Logic in Bit- Vector Word Level Model 
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Several safety properties of the higher-level models are used to bound the 
values of the divider and partial remainder at each iteration. Combined with 
properties of the abstraction mapping, refinement is straightforward to show. 
The proof obligations were discharged by the combination of a proof rule that 
reduces arithmetic operations on bit-vectors to BDDs, and the BDD-based tau- 
tology checker. 



Refinement between the Bit- Vector Divider and Speed-Independent 
Divider. The speed-independent model consists of three divider stages and all 
control is performed by explicit handshaking without any timing assumptions. 
For the abstraction mapping it is necessary to determine which stage’s output 
to map to the output of the synchronous model’s only stage. Intuitively, the 
precharge control ensures that at any time, there is a stage whose output value 
is the last partial remainder computed, and this stage can be identified by the 
state of the precharge control. We verified a hand-written invariant to show that 
the control logic operates as intended. We then defined an abstraction function 
that selected the appropriate output value for the partial remainder based on the 
state of the precharge control. Using this abstraction function, the refinement 
property was easily proven. 



Refinement between the Speed-Independent Divider and the Timed 
Divider. In the speed-independent model, the precharge control block performs 
an explicit check to ensure that stage i+1 is done precharging (i.e. its outputs are 
empty) before stage i starts evaluating. The timed model starts both operations 
in parallel, and timing bounds are used to ensure that precharging completes 
before evaluation. This corresponds to Williams’ first optimization in the design 
of the chip, as discussed in section 4.1. 

We use the approach of [3] to model time: a real-valued variable is added to 
the program to model the current time, transition guards are strengthened to 
express lower bounds on delays, and an action for advancing time is defined so 
as to observe upper bounds on delays (i.e. time may not progress beyond the 
maximum delay for a pending action) . In this model, the clause of the guard for 
the evaluate action that asserted that the successor stage is done precharging is 
replaced by a clause that states that the successor stage started precharging suf- 
ficiently far in the past. We then verified an invariant that implies that whenever 
this timing condition is satisfied, the successor stage has finished precharging. 
With this invariant, refinement was easily verified (see [15] for details). 



Refinement between the Timed Divider and Transistor-Level Divider. 

To establish that the transistor-level model implements the timed divider, two 
major problems have to be addressed. First, the dual-rail encoded signals of the 
transistor-level model must be mapped to the bit-vectors of the timed divider. 
Second, in the transistor-level model only the quotient digit output is used to 
determine if a stage has finished evaluation. It therefore needs to be shown that 
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the quotient digit of a stage becomes valid only after all other outputs of a stage 
are valid. This corresponds to Williams’ second optimization as mentioned in 
section 4.1. 

The first problem was addressed by defining an appropriate abstraction map- 
ping. Solving the second problem requires an argument about the timing of 
events as data values propagate from a stage’s inputs through its logic elements 
after it enters evaluation mode. Our verification adapted a simple depth-first 
graph traversal algorithm for timing verification of combinational logic for use 
in the self-timed context. The timing analysis is encapsulated as an inference 
rule that introduces a theorem, which in turn states a transistor level safety 
property expressing timing bounds for a stage’s outputs. The timing analysis 
requires several side conditions to hold (expressed as assumptions of the above 
theorem), stating e.g. that the inputs to a stage (i.e. its predecessor’s outputs) 
remain stable while it is in evaluation mode. Intuitively, the computation in the 
divider ring proceeds as follows: A stage’s dual-rail signals are reset to “empty” 
during precharging. In evaluation mode, the signals are assigned “valid” values 
based on the output signals of the previous stage. The previous stage, which is 
in hold mode, keeps its outputs unchanged while this stage is evaluating. The 
side conditions are satisfied as long as the divider conforms to this sequence. 

To discharge the above side-conditions, one needs to formally show that the 
divider’s operation indeed follows the intuition. To this end, we introduced a side 
hierarchy of models that matched the handshaking of the original hierarchy with 
the details of the computation abstracted away. Corresponding safety properties 
were proven for the highest, speed-independent level of the side hierarchy, which 
were then inherited down (through refinement) to the transistor level and used 
to discharge the side conditions of the timing analysis. 

The introduction of the side hierarchy allowed us to discharge all proof obli- 
gations without ever having to prove an invariant or safety property directly at 
the transistor level. Due to the timed nature and the amount of detail present at 
this level, this would have been extremely difficult and time-consuming. See [15] 
for details on the timing analysis and the use of the side hierarchy. 

5 Conclusions 

We have demonstrated an approach to the verification of hardware designs that 
combines deductive reasoning with algorithmic decision procedures. Like theo- 
rem provers such as HOL, Isabelle or PVS, our tool employs the notion of proof 
states, to which a sequence of inference rules and decision procedures is applied 
to form a proof. The most important distinction between our tool and more 
traditional provers is that the set of available inference rules and decision proce- 
dures is not fixed, but may be extended with domain-specific rules. This permits 
reasoning that would be unacceptably costly to formalize rigorously in logic to 
be introduced into a correctness argument in a controlled manner. 

We have demonstrated the practical applicability of our approach by carrying 
out a top-to-bottom verification of a non-trivial hardware design, a self-timed 
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implementation of SRT division. Our verification connects a high-level specifi- 
cation of the SRT division algorithm with a formalization of the transistor-level 
implementation through a series of refinement proofs. Safety-properties proven 
at the highest level, in particular correct division, are propagated down the chain 
of refinements and thus hold for the implementation. The proof obligations aris- 
ing from the safety property and refinement proofs varied widely in nature, from 
arithmetic obligations at the algorithmic level to timing properties at the tran- 
sistor level. Although there have been many published verifications of dividers, 
we believe that our work is distinguished by spanning the complete design hi- 
erarchy. Domain-specific proof rules such as the timing-verification procedure 
played a crucial role in achieving this. 
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Abstract. We describe the design of an open ended set of tools for 
manipnlating multi-dimensional tabular expressions. The heart of the 
toolset is a set of modules that makes it possible to add new tools to the 
toolset without having detailed knowledge of the other tools. This set 
of modules, the Tool Integration Framework, supports new and existing 
tools by providing abstract communication interfaces. The framework 
that we describe has proven to be a practical approach to bnilding an 
extensible set of tools. 



1 Introduction 

The McMaster University Software Engineering Research Group (SERG) studies 
documentation methods for computer systems engineering that use mathemati- 
cal functions and relations [18] to describe program behaviour. The mathematical 
expressions that describe the behaviour of real systems are often so complex that 
they are difficult to write and use. When expressions are written in a tabular 
form, they are much more easily formulated and interpreted [16]. 

While the value of tabular notation has often been demonstrated [4, 5, 3, 
6, 11], we believe that well designed tools can reduce both the effort needed to 
write tabular expressions and the number of errors in the documentation. To 
demonstrate this, our research group is developing a suite of tools, collectively 
known as the Table Tool System (TTS), for creating, editing, printing, analysing 
and interpreting tabular documentation. This paper presents an overview of the 
design of the TTS. It is intended both to draw attention to the TTS and to 
provide an example of a system in which modularisation, abstraction, and other 
related design principles are applied consistently. 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 345-359, 1999. 
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New components of the TTS are usually produced as Masters Theses by 
students who come, learn about software engineering, and then leave. It is nor- 
mally extremely difficult to get such independently written components to work 
together in a useful way. The Tool Integration Framework that is described in 
this paper has changed that. Components can be developed independently, then 
easily integrated into our system by people who do not know the details of either 
the old or the new components. 

1.1 Background 

Much traditional engineering documentation is mathematically based and pre- 
cise, rich in information content and consistently interpreted from user to user. 
Parnas and Madey [18] have shown how the essential properties of computer 
systems can be described by using mathematical relations. These relations can 
be characterised by first-order predicate logic (e.g., [17]]. By providing these re- 
lations, computer systems designers can document their designs systematically, 
and use that documentation to conduct thorough reviews and analysis. 

The expressions that characterise the relations that result from applying func- 
tional documentation techniques to real programs are usually complex because 
they must distinguish many cases. When represented in their customary form 
(i.e. as a one dimensional expression) they would be too difficult to read to be 
practical. In [16], Parnas defined a new notation, called tabular expressions, that 
grew from earlier work at the Naval Research Laboratory in Washington, DC [4]. 
Tabular expressions in this new form have the same theoretical expressive power 
as conventional notation but, by organising the expression as an array of much 
simpler expressions, they are much easier for human readers to interpret. The 
reader is referred to [16, 7, 6] for descriptions of table types and interpretations. 

The importance of readability in engineering documentation is clear when 
the role of documentation in the design process is considered. Documentation 
should capture the measurable objectives of the design effort. The actual re- 
sults of the design effort can be compared with the objectives expressed in the 
documentation at several points during the design process. For example, docu- 
mentation will be used as the basis for design reviews [13] and as the basis for 
testing procedures. 

The examples in Fig. 1 show three different formats for an expression repre- 
senting the function f{x,y). The benefits of representing expressions in tabular 
form are shown even more clearly by longer, more realistic, expressions such as 
those in [1, 23]. 

Without support tools, a great deal of time is spent performing tasks that 
could be automated. Mathematical expressions can be checked mechanically and 
used for automated verification of the design specified by the documents. For ex- 
ample, manually comparing two tables to see if they represent the same function 
is a very important task that can be very time consuming and tedious; small 
errors can be difficult to detect. We need tools to help automate those jobs that 
can be automated, so that our time and energy can be devoted to the more 
interesting tasks of system design. 
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(c) f{x, y) described using tabular notation 
Fig. 1. Example Representations of the Function f{x,y) 



1.2 Goals 

Because the syntax of tabular expressions is not compatible with existing doc- 
ument production tools such as word processors, symbolic mathematics proces- 
sors, spreadsheets, etc., the goal of the TTS project is to develop an integrated, 
extensible system of tools — that is, a set of tools that work together; to facilitate 
the use of tabular expressions, e.g., in computer systems documentation.^ Many 
complex tasks can be accomplished relatively easily by combining tools from a 
small set of carefully designed components. TTS capabilities include such things 
as entering and modifying tabular expressions, long term storage of expressions 
on disk, formatting of expressions for output, and transforming tables to other 
forms. 

Our understanding of how such systems should be implemented is growing. 
The development of TTS, and its need for extensibility, reflects this process. 
Each tool is first developed in a basic form and refined as experience with its 
use teaches us how to better implement it. Ideas for completely new components 
will certainly arise in the future. It is important to have a stable interface stan- 
dard that defines how these tools should interact with one another so that tools 
developed at different times by different designers can operate with each other. 
It is important to establish these standards as early in the project as practical 
to minimise the amount of rework. 

^ “Working together” means, for example, that an input tool can pass a tabular ex- 
pression to an output tool for display or to an evaluation tool to calculate a value. 
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Each new member of our group typically either develops some new component 
to extend the functionality of the TTS, or tries to apply the TTS to solve a new 
problem. To accomplish this, it is essential that the TTS be easy to extend, 
even if the extension had not been foreseen, and that the TTS components can 
be easily combined in new ways. Since people join the group at different times, 
work at different rates, have different backgrounds and usually leave as soon 
as they finish their thesis, we cannot expect them to function as a cohesive 
development team. Our tool integration framework seeks to overcome this by 
allowing developers to work independently while taking full advantage of the 
work done by others. 

1.3 General Design Principles 

The Table Tool System is designed for future growth by encapsulating design 
decisions regarding interaction between TTS components in a common set of 
modules known as the Tool Integration Framework. 

The modules of the TTS all have a “hidden” implementation [14]. By using 
the principle of information hiding, users of a module need not know anything 
about the internal data structures and other implementation details of that 
module. In other words, any module can be treated as a black box. The interface 
to a module consists a set of access programs. TTS developers can only access 
modules by calling those programs. The use of any TTS module actually consists 
of writing a list of access program calls, for example, to create, manipulate and 
destroy objects of a type implemented by the module. Modules may support any 
number of objects, so in the currently popular object-oriented terminology, each 
implements an object class where the access programs correspond to methods. 
Although inheritance can save effort and time during the initial coding, we have 
seen it used in ways that make subsequent maintenance more difficult and have 
avoided it. We did not find it necessary to use an 0-0 language or terminology 
although we applied many of the good design principles that are implicit in some 
0-0 approaches. 

Since we are planning for a large family of TTS tools, we have not followed 
the classical “top down” approach to system design. Instead, we have chosen to 
specify and construct some basic “building-blocks” to be used in the whole tool 
family. These building-blocks have been selected using the principle that E. W. 
Dijkstra has called “separation of concerns” . 

1.4 Documentation 

Since our research focus is documentation for computer systems development, it 
makes sense to use this project as a proving ground for our methods. We have 
found that a combination of both formal and informal documents is required. 
The complete documentation of the TTS is given in [24].^ The informal system 

^ The printed form of this document is not kept up to date. For internal use an up-to- 
date electronic version is maintained. 
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overview and module guide sections, from which this paper is primarily drawn, 
serve to introduce new group members to the structure and capabilities of the 
TTS. In addition, for each module, we produce two different styles of interface 
description: an informal module interface guide, and a formal module interface 
specification. The informal document serves as a quick guide to the module 
so that developers can gain an intuitive understanding of its capabilities. The 
formal specification is used as a reference document to get specific information 
about the module behaviour. 

For the kernel modules, which are critical to the system, formal internal 
design documents have been produced as well and these have been used for 
structured review of the code. Unfortunately, since the TTS was in its early 
stages of development when the kernel was built, we could not use it to help 
with that documentation process, so we learned first hand how tedious it can be 
to use mathematical methods without supporting tools. 

2 TTS Module Structure 

This section describes the modular structure of the TTS by describing the se- 
cret (hidden information) of each module in the system. Figure 2 shows a design 
schematic for the TTS: nested boxes represent the sub- module relationship, and 
the vertical arrangement roughly corresponds to the uses hierarchy. [12, 15] At 
the first level of decomposition, the TTS is divided into three modules: kernel, 
infrastructure and applications, which are described in more detail below. The 
intended relationship between these modules is that programs in the kernel mod- 
ule use only other kernel programs, while programs in the infrastructure module 
use other infrastructure programs or kernel programs, but not applications, and 
applications may use any program. Thus, these modules can be viewed as tiers, 
each building on the ones below. These tiers do not, however, necessarily cor- 
respond to levels in the uses hierarchy since, for example, each module may 
contain programs that use no others, and hence are at the lowest level of the 
hierarchy. The tiers structure gives a useful overview of the system, but it omits 
information that is included in the uses relation, which is needed, for example, 
for maintenance and testing. 

Kernel The lowest tier contains essential modules — those that are used by all 
other TTS modules. It hides the implementation of the abstract data types 
that represent expressions, their semantics and representation. These data 
types, known as TTS objects, are the only objects that may be passed be- 
tween tools. Kernel modules do not interpret expressions. 

Infrastructure The middle tier contains tools that operate on TTS objects to 
provide some service to the user (e.g., modification of an expression), and 
modules that allow these tools to be combined. These are the ‘blocks and 
mortar’ from which applications are constructed. The infrastructure provides 
a useful set of operators for kernel objects but does not share the secret of 
the kernel. 
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Fig. 2. TTS Design Schematic 



Applications The top tier contains programs that combine infrastructure mod- 
ules to manipulate or interpret groups of tabular expressions as documents, 
which specify or describe some aspect of a computer system. For example, 
a set of expressions might be interpreted as a relational specification of the 
intended behaviour of a program, and an application could be used to check 
if the observed behaviour of the program is consistent with the specification. 



2.1 Kernel 

The kernel module hides data structures representing expressions, and algo- 
rithms for manipulating them. The kernel is divided into two modules: 
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table holder hides data structures representing expression structure and iden- 
tifiers; 

information module hides the means of associating presentation and seman- 
tics information with particular symbols used in expressions. 

Table Holder A mathematical expression, whether tabular or not, can be 
viewed as a set of components joined by operators, where each component is 
itself an expression. The tools that interpret these expressions must be able to 
access any component of the expression. The table holder module provides a 
mechanism that allows tools to do this without knowing how the information 
about the expression is represented within the computer. 

Each component of an expression has an address, which is the path that one 
must follow to reach that component from a starting point that is the address 
of the whole expression. Each path is a sequence of indices; each index identifies 
a subexpression. Each index in the path brings us closer to the subexpression 
addressed by the path. 

Information stored in the table holder is restricted to the structural informa- 
tion about expressions; this is the information needed by all tools that manipu- 
late expressions. It contains no information about interpretation. For example, 
a table is stored as an array of grids without any assumptions about which grid 
will be considered the main grid and which are to be interpreted as headers. 
A function is stored as a list of arguments and an identifier which will index 
into an external semantic description. By carefully isolating these elements of 
the expressions, which are independent of interpretation and are not likely to 
change, the table holder forms the foundation upon which the table tools system 
is built. The table holder contains four sub-modules: Expression, which hides the 
data structure representing actual expression structure, Shape, Index and Path, 
which each hide data structures representing auxiliary objects that facilitate the 
expression manipulation activities that must be performed. These are explained 
below. 

Expression The Expression module hides the implementation of the data struc- 
ture and algorithms for representing and manipulating expression syntactically. 
Expressions are grouped into three different categories based upon their different 
structures; atoms (constants or variables), applications (functions or predicates 
with 1 or more arguments), and tables. The module has access programs for cre- 
ating, destroying, copying, modifying, reading, comparing, loading and saving 
expressions. 

Index The Index module hides the data structure for representing the position 
of a table cell within a table. An Index consists of a grid number (int) and a 
sequence of n numbers (int), where n is the dimensionality of the grid, which 
indicate a position in the table (e.g. the row number and column number of 
the cell). The module has access programs for creating, copying and destroying 
indices, as well as assigning and retrieving grid numbers and dimension values 
and incrementing or decrementing an index value with respect to a Shape. 
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Path The Path module hides the data structure for representing the position 
of a sub-expression within an expression. A Path is a sequence of objects, each 
of which is either an Index or an int. Integers are used to specify a particular 
argument of an application and Index objects identify an element of a table. 
The length of a Path can be increased by inserting an element, or decreased 
by deleting an element at any position. The module has access programs for 
creating, copying and destroying paths, as well as inserting, assigning, deleting 
and retrieving elements at any position of the path and retrieving the length of 
a path. 

Shape The shape of a table is the number of grids it contains and the number of 
dimensions and length in each dimension of each grid. The Shape module hides 
data structure for representing the shape of a table. It exists so that a user can 
separate the task of describing the shape of a table from the action of creating 
the table. A Shape object can be retrieved from an existing table when needed. 
The module has access programs for creating, copying and destroying shapes, as 
well as assigning and retrieving the number of grids, the dimensionality of each 
grid and the length of each grid in each dimension. 



Information The information module associates each Id stored in a table 
holder expression with information about the symbol that it represents. The 
secrets of the Information module are the data structures used to associate an 
Id with its data and the algorithms for manipulating these. The symbols are 
grouped into symbol tables within which each symbol is identified by an Id, 
the value of which is determined by the information module when the symbol is 
created. The Id can be stored as part of an expression in the table holder so that 
other modules can use it to gain access to the data in the information module. 
For each symbol table, the information about symbols is organised into named 
information classes (e.g. name, type, font family). Each symbol may or may not 
have data for a given information class, so that new classes can be added when 
needed — for example, to support new tools — without needing to modify existing 
symbols, expressions or tools. This is key to the extensibility of the TTS. 

The module provides access programs for creating and destroying symbol ta- 
bles and information classes, and for creating symbols and assigning or retrieving 
data for a particular class name and symbol Id, or searching to find symbols that 
are associated with specific data or data patterns. 



2.2 Infrastructure 

The TTS infrastructure module hides data structures and algorithms that enable 
users to manipulate TTS objects. This module contains three modules: 

Tool Integration Ftamework TIF modules provide a means of interaction 
between tools and between users and the TTS. 
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Tools Primitive services are atomic operations on expressions or symbols that 
the user can invoke from the TTS main menu (e.g., printing, combining or 
editing expressions) . A tool is a module that provides one or more primitive 
services. 

Utilities A utility is a module that implements an algorithm that may be useful 
to more than one tool and is independent of both the user interface and the 
tool integration framework. Utilities cannot be invoked directly by a user 
(i.e. they do not provide ‘primitive services’). 



Tool Integration Framework (TIF) The TIF module hides the interface 
between tools so that they need not know what other tools are in use or available 
in the system. It also provides a user interface for invoking individual tools and 
passing objects between them in a consistent manner. It is made up of five 
modules, as described below. 

Tool Manager (TM) The secrets of the tool manager are the data structures 
that represent those characteristics of tools that are relevant to the TIF, and 
the information needed about the run-time instances of tools. The TM also 
hides the algorithms for invoking the tools. The tool manager defines a language 
that can be used to define applications by describing combinations of tools. 
Available components can be added or deleted at run-time. The TM defines a 
standardised interface, to which all of the tool modules it manages must conform. 
It is composed of three sub-modules; the Service Registry, the Instance Registry, 
and Tool Interface modules. 

A service is a procedure that the user invokes via a single command. It is 
provided by executing a sequence of one or more operations {primitive services) 
provided by the TTS tools. The Services Registry contains the following infor- 
mation about each service, so that it can be invoked when needed: 

— the type of user interface required (i.e., graphical, none), 

— what name and menu item is used to refer to the service, 

— what type(s) of objects a service operates on, 

— how the service is provided (i.e., what access programs, of what tools, in 
what order). 

The Instance Registry module tracks service use, updating instances of primi- 
tive services executing at a particular time. The Tool Interface module maintains 
information about the invocation history of the tools so that it can determine 
what needs to be done before it can handle requests for service (e.g. does a tool 
need to be initialised first etc.). 

User Interface Manager (UIMgr) The UIMgr provides a UI ‘framework’ in which 
tools can interact with users without their designers needing to invest effort de- 
veloping UI functionality that is not specific to their tool — it hides the char- 
acteristics of Motif that would otherwise need to be known by the developer of 
each tool that has a graphical user interface (e.g., it encapsulates the Motif event 
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loop). For tools that have a graphical UI, the UIMgr makes available a parent 
window handle, so that the tools can create their own windows as children of the 
parent window. It also provides a uniform means for tools to inform other tools 
of significant events (e.g., an expression has been modified) without the sender 
or receiver needing any knowledge of the other tool(s). UIMgr does not hide the 
choice of Motif as a UI platform, however, since to develop a sufficiently general 
purpose abstract interface for graphical user interfaces would involve duplication 
of a significant portion of systems such as Motif. 

The secret of the UIMgr is the implementation of the TTS ‘top level’ User 
Interface, which enables users to invoke services to operate on TTS objects and 
to switch between concurrently executing services. It also hides the calls to access 
programs that initialise the TIF and kernel modules and the order in which they 
are called. 

User Error Interface The secret of the Error User Interface module is the method 
of reporting error and status messages to users. It provides access programs for 
tools to report errors to users. The user can customise the UI so that errors 
of severity level below a threshold will not be reported (although not all error 
messages can be disabled) and can have error messages logged to a file. 

Error Handling The Error Handling module hides the data structure for repre- 
senting the status of invocations of tools and TIF module access programs. It 
also hides an algorithm for translating a status token into a textual description. 
The module has access programs for setting the status as well as for retrieving 
the status token or the textual description of a status token. 

Clipboard/ Selection Manager (CSMgr) The Clipboard/ Selection Manager mod- 
ule hides data structures and algorithms for passing TTS objects between tools. 
It defines a standard interface through which TTS objects can be exchanged 
between tools. 



Tools Module The Tools module hides algorithms and data structures for 
tools that provide primitive services enabling users to directly manipulate TTS 
objects. Some example tools are described below. Tools and applications are 
normally composed of several sub-modules but, since these are not the focus of 
this paper the detailed structure is not described here. 

Context Manager (CM) A context is an ordered set of of named expressions 
together with a symbol table containing information about the symbols in the 
expressions. Since the expressions use the same symbol table. Ids will have con- 
sistent interpretation throughout the context, which simplifies manipulation and 
interpretation of expressions. 

The CM hides the implementation of a user interface and file format for 
working with contexts. Several contexts may be in use at the same time within 
one application. The user interface allows the user to 
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— load a context from disk, 

— save a context to disk, 

— create new (empty) expressions, 

— delete expressions, and 

— select expressions for use by other tools. 

Table Printing Tool (tptool) The Table Printing Tool module hides the imple- 
mentation of a tool for viewing and changing the physical appearance of an 
expression without changing its meaning. The tool allows the user to adjust 
such things as the print size of parts of expressions, and the width and height 
of table rows or columns, but does not allow any modifications to the contents 
of the expression, or changes to the symbol information that may change their 
interpretation (e.g. font-face). 

The tool produces a Postscript representation of the expression, suitable for 
printing. It uses the TH and info modules, respectively, to retrieve the expression 
and symbol information. [26] 

Table Construction Tool (TCT) The TCT hides the implementation of a user 
interface for entering and editing the contents of an expression, while ensuring 
that it is syntactically correct. It allows the user to construct an expression 
by building it up from smaller expressions. It uses the information module to 
retrieve symbols and the TH to retrieve the expression. Several instantiations of 
the TCT may be in use at the same time within one application. [10] 

Specialisation and Simplification Tool (SAST) The specialisation and simplifi- 
cation tool provides algorithms for simplifying tabular expressions by taking into 
account user-supplied constraints on the variables that appear in the expression. 

Symbol editor The Symbol Editor allows a user to modify the set of symbols 
available for use in expressions. It consists of two sub-modules; the Symbol Editor 
UI and the Symbol Utilities modules. 

The Symbol Editor UI module hides the implementation of a user interface 
for loading, viewing, selecting, editing and saving the information about symbols. 

The secrets of the Symbol Utilities module are the files and information 
classes used to represent common (default) symbols and symbol property inher- 
itance in the Information module. It has access programs that mirror some of 
the access programs of the Info module but take default symbols and inheritance 
into account. 

Table Inverter The table inverter module hides algorithms for ‘inverting’ and 
‘normalising’ tabular expressions. In some cases, tabular expressions can be eas- 
ier to understand, or made more compact if displayed in a different form. Table 
transformations and tools for performing these functions are described in [22] . 

Table Composition Tool The table composition tool hides algorithms to calcu- 
late the relational composition of two tabular expressions. [25] 
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Table Checking Tool The table checking tool hides algorithms to check, us- 
ing an automated proof system, that tabular expressions satisfy two condi- 
tions: disjointness and completeness, which are usually requirements for correct 
specifications. [8]. 



Utilities The Utilities module hides algorithms that may be useful to more than 
one tool but do not implement primitive services and hence cannot be invoked 
directly by a user. This module is independent of the user interface system (i.e. 
Motif) and the TIF module interface. 

Kernel Utilities Kernel Utilities are utilities that make use of the TTS kernel. 

The expression utilities module hides algorithms for traversing the sub-ex- 
pressions of an expression and for manipulating the set of Ids used in an ex- 
pression. It has access programs for calling a caller-supplied program for each 
sub-expression of an expression, for finding all Ids used in an expression and 
for substituting all occurrences in an expression of an Id from one list with the 
corresponding Id in another list. 

The info utilities module hides algorithms for performing common opera- 
tions on symbol tables that have been created by the info module. It has access 
programs for merging tables, removing lists of Ids from tables and finding the 
intersection or union of the set of classes in two tables. 

The generalised table semantics module hides algorithms and data structures 
that represent the semantics of a table as part of an expression. This semantic 
information can be accessed by other TTS table evaluation tools and applications 
like the test oracle generator. (See 2.3) 

General Utilities General utilities are those utilities that do not make use of the 
TTS kernel. 

The secret of the id list module is the data structure for representing se- 
quences (lists) of Ids. It also hides algorithms for searching and manipulating 
these lists. Note that since this module does not manipulate Ids in any way, it 
is independent of the TTS kernel. 



2.3 Applications 

The applications module hides the implementation of applications, which treat 
TTS objects as components of relational documentation and allow the user to 
edit, analyse or interpret that documentation. 



Test Oracle Generator (TOG) The test oracle generator interprets a set 
of expressions as a program specification and uses it to generate a ‘test ora- 
cle’, which can be used to verify the actual behaviour of a program against the 
specification. [19, 21] 
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Module Reliability Evaluation Tool (MRET) The module reliability es- 
timation tool interprets a set of expressions as a module interface specification 
and, using an operational profile and a module under test, estimates the relia- 
bility of the module. [9] 

Monitor Generator The monitor generator interprets a set of expressions 
as a system requirements specification for a real-time system and uses it to 
generate a ‘monitor’, which reports if the system behaviour is consistent with 
the requirements. [20] 

3 Experience 

In order to test the viability of the TIF and to validate our design, several tools 
that were developed prior to its conception have been integrated into the TTS. 
The integration process in some cases has been complicated by inconsistencies 
between the various developers in the assumptions implicit in their designs. 
Despite this, we have been successful in integrating these tools, and more tools 
are being added as resources permit. 

Group members who have started to develop new tools since the TIF has 
been added have found that they have a significant advantage over their prede- 
cessors. They do not have to spend time developing support software in order 
to demonstrate their results. 

Although the TIF is a relatively new addition to the TTS, our experience 
so far has convinced us of its value and of the suitability of the ‘framework’ 
architecture to development environments such as ours. We have found that 
the architecture encourages new group members to develop tools that integrate 
smoothly with the TTS and, at the same time, allows them to concentrate on 
the research problem at hand by removing the need for them to create support 
software in order to demonstrate their results. Although it is difficult to be 
certain with such a small sample size, it appears that since the introduction of 
the TIF to the TTS the pace of tool development and integration has increased 
significantly. 

4 Further Information 

Space limitations prevent us from providing complete descriptions of the inter- 
faces to these modules. Further information can be found in [24] or by visiting 
the TTS web page at http : //www. crl .mcmaster . ca/SERG/TTS/ttsHome .html. 
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Abstract. Program analysis is still characterized by paradigm-specific 
approaches, which are developed to accommodate to the diversities of the 
different programming paradigms as e.g. the imperative, object-oriented, 
or parallel one. Switching between paradigms or transferring analyses 
across paradigm boundaries requires usually detailed knowledge of the 
peculiarities of the various approaches. This complicates both the reuse of 
analyses and the proofs of their correctness. On the other hand, abstract 
interpretation provides a unifying access to program analysis. In this ar- 
ticle we exploit this for the construction of program analysis generators 
based on a uniform design principle. Basically, we proceed by extracting 
the abstract kernel from the standard analysis framework, which we then 
consider under a generic perspective. We show that there are concrete in- 
stances in such different paradigms as those above. As a by-product their 
decomposition into a “theoretical” and “practical” part which are specifi- 
cational and computational in nature, reveals the aforementioned design 
principle. The frameworks and their respective generators, which can 
be fed by concise specifications, can thereby be considered black-boxes: 
analysis designers only need to know of the (quite similar) interfaces. 
The proof of correctness or even precision of a generated algorithm with 
respect to a specific property reduces to checking the premises of a few 
theorems. This considerably eases the construction of analyses within 
a specific paradigm as well as the switch between and the transfer of 
analyses to other paradigms. 

Keywords: Program optimization, abstract interpretation, data-flow 
analysis (DFA), DFA-frameworks, DFA-generators, coincidence theorems, 
intraprocedural, interprocedural, parallel, object-oriented, conditional DFA. 



1 Motivation 

Static program analysis — in the context of optimizing compilers usually called 
data-flow analysis (DFA) — is an almost indispensable prerequisite for the ap- 
plication of performance improving transformations by optimizing compilers (cf. 
[1,11,33,34,35]). Typical questions to be answered by DFA in order to enable clas- 
sical optimizations like code motion (cf. [32]), constant propagation (cf. [13]), or 
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dead code elimination (cf. [1]) are, if a program term t has always been com- 
puted when reaching a specific program point (“Is t available?”), if its evaluation 
always yields the same constant value there (“Is the value of t a constant?”), or 
if a variable v will not be used on any program continuation leaving this point 
without a preceding redefinition (“Is v dead?”). 
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\ Interface / 
Generator 
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Fixed Point Alg. J 1 1 
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Fig. 1. The general structure of a DFA-framework. 



After fixing the relevant program property DFA-designers are thus typically 
faced with two problems: first, inventing an algorithm for computing the set 
of program points enjoying the property under consideration; second, proving 
that it computes this set precisely. A common observation here is that the more 
powerful and expressive the features are the underlying programming language 
provides (procedures, parallelism, objects, polymorphism, etc.), the more con- 
straints must be respected by a DFA-designer which are imposed by the fea- 
tures of the programing language rather than the property considered. This is 
quite important because these constraints tremendously influence the technical 
complexity of the algorithms and the proofs of their correctness and precision. 
Consequently, it is the less adequate to construct DFA-algorithms by means of 
ad-hoc techniques the more powerful and expressive the considered programming 
language is because this usually amounts to inventing an individual solution for 
the considered analysis problem both on the algorithmical and the proof side. 

In this article we reconsider the design process of DFA-algorithms under 
the unifying view of abstract interpretation. This leads us to a uniform mul- 
tiparadigm approach, which in addition suggests a principle for the automatic 
generation of DFA-algorithms. To this end we first reduce the standard frame- 
work for intraprocedural DFA to its abstract kernel. Considering it then from a 
generic point of view, we demonstrate that it has instances in intraprocedural, 
interprocedural, {data-) parallel, explicitly parallel, object-oriented, and condi- 
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tional DFA; hence, it applies to quite different paradigms. Moreover, this ap- 
proach provides DFA-designers with strong support both on the practical and 
the theoretical side. On the theoretical side because precise guidelines can be set 
up, which structure and simplify the development of DFA-algorithms as well as 
the proofs of their correctness or even precision. On the practical side because 
DFA-generators can be distilled from the frameworks allowing the automatic 
generation of DFA-algorithms from concise specifications. Though the concrete 
DFA-frameworks and their respective DFA-generators differ in their details for 
such different programming paradigms, from the perspective of a DFA-designer, 
they can be considered black-boxes sharing almost the same interface. In fact, 
knowing their (quite similar) interfaces is sufficient for the successful and rapid 
development of proven correct DFA-algorithms. 

Overview. Figure 1 illustrates the essence of our approach from the point of 
view of a DFA-designer. In this figure 4> is assumed to denote the property of 
interest. Fundamental for computing the set of program points enjoying (p is the 
theory of abstract interpretation (cf. [5,6,7,30,36]). It provides a well-founded 
basis for DFA. The point here is to replace the “full” semantics of a program 
by a simpler more abstract version, which is tailored for the problem under 
consideration. Usually, the abstract semantics is defined by a (local) semantic 
functional, which gives abstract meaning to the (elementary) statements of a 
program in terms of transformations on a complete lattice. Its elements represent 
the data-fiow facts of interest. A local abstract semantics induces two notions 
of a solution of the respective DFA-problem, which result from two different 
globalization approaches: the MQP-solution of the “operational” meet- over- all- 
paths (MOP) approach, and the MFP-solution of the “denotational” maximal- 
fixed-point (MFP) approach. 

The MQP-solution mimics the effect of possible program executions: for every 
program point it is the “meet” (intersection) of all data-fiow facts contributed by 
program paths reaching it. Usually, the MQP-solution is conceptually quite close 
to the program property of interest, but in general the underlying MQP-approach 
is not effective. It is thus specifying in nature. In distinction, the MPP-solution 
is defined as the greatest solution of a system of equations imposing consistency 
constraints on an annotation of the program with data-fiow facts: in essence, 
the data-fiow fact attached to a program point must be implied by the results 
of transforming the informations attached to its predecessors according to their 
abstract meaning. 

In contrast to the MQP-approach, the MPP-approach is practically relevant. 
It induces a generic fixed point algorithm, which (under specific side-conditions) 
terminates with the MPP-solution. As shown in Figure 1, this algorithm can di- 
rectly be fed with a local abstract semantics: the concrete DFA-algorithm results 
automatically from instantiating the generic algorithm by the DFA-specification 
under consideration, and need not be implemented by the DFA-designer. The 
DFA-designer is only left with proving that the generated algorithm precisely 
computes the set of program points enjoying (p. This can be proved in only three 
independent steps, which are based on properties of the specification only. 
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1. Equivalence: prove that the program property (j) under consideration is equiv- 
alent to the MQP-solution of the DFA-problem specified (1). 

2. Coincidence: prove that the MQP-solution and the MPP-solution of the 
DFA-problem considered coincide (2). 

3. Effectivity: prove that the automatically generated DFA-algorithm termi- 
nates ( 3b) ) with the MPP-solution ( 3a) ) . 

The gap which must be bridged in the first step is usually considerably small 
because typically both (j) and the MQP-solution are defined in terms of the effect 
of program paths over the same basic properties. This gap can be bridged by a 
usually straightforward induction on the length of program paths. 

The second step is the central one of this proof sequence. It has to bridge 
the gap between the theoretical part of the DFA-framework and its respective 
DFA-generator. The glue combining them is a coincidence theorem establishing 
the coincidence of the conceptually quite different MOP- and MPP-solution. The 
coincidence theorem gives a sufficient condition for the coincidence of the MOP- 
solution, which constitutes the requested reference solution of a DFA-problem, 
and the MPP-solution, which is computed by the generated DFA-algorithm. In 
fact, the coincidence theorem can be considered the theoretical backbone of a 
DFA-framework. 

In the third step, finally, it must be verified that the generated DFA-algorithm 
terminates with the MPP-solution. Similar to the coincidence theorem, an ef- 
fectivity theorem gives a sufficient condition for this. It can be checked knowing 
the DFA-specification only. 

Benefits. The proof obligations of the second and third step require to check 
the premises of a coincidence theorem and an effectivity theorem only. In general, 
this is much simpler than establishing the corresponding results for an algorithm 
invented afresh for a problem. Moreover, the concrete DFA-algorithm, which 
decides the program property of interest, comes for free in this approach. It is the 
algorithm automatically resulting from instantiating the generic algorithm of the 
framework with the considered DFA-specification. This is particularly important 
because the specification interface and the proof obligations remain essentially 
the same though the internal structure of the DFA-frameworks becomes more 
complex when the features of the programming language are enriched. Thus, 
the benefits of applying the DFA-framework are the greater the more powerful 
the programming language is the framework is designed for. In fact, though 
the details of the frameworks for intraprocedural, interprocedural, parallel or 
conditional DFA are quite different, the DFA-designer can think of and apply 
them as black boxes: a framework and its respective DFA-generator is a black 
box, which accepts in a specific format a DFA-specification which is tuned to the 
program property of interest. It returns an algorithm, which computes the set of 
program points enjoying this property, provided that the three proof obligations 
labeled equivalence, coincidence, and effectivity are supplied as illustrated in 
Figure 2. 

Summarizing, for a DFA-designer the major benefits are as follows: 
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Fig. 2. The black-box view of DFA-frameworks. 



— Information hiding: all details which are not relevant for a particular appli- 
cation are hidden. 

— Automatic generation of DFA-algorithms: the concrete DFA-algorithm for a 
DFA-problem results automatically from a concise specification in terms of 
an abstract interpretation. 

— Precise proof obligations: proving the generated algorithm to be precise for 
the property of interest requires only knowledge of the specification. 

And last but not least: 

— Uniformity : applicability of the overall approach to a broad range of pro- 
gramming paradigms. 

In the remainder of this article we focus on the last point. We demonstrate 
that the benefits summarized above, which have previously been demonstrated 
for intraprocedural and interprocedural DFA (cf. [14,15,24]), can be realized for 
further paradigms, too, including object-oriented, parallel, and conditional DFA. 



Structure of the Article. In Section 2 we reconsider the intraprocedural 
base case and illustrate the essence of our approach by discussing the inter- 
nal structure of the standard intraprocedural DFA-framework. Subsequently, we 
demonstrate that the pattern of the intraprocedural case carries over to inter- 
procedural, data-parallel and object-oriented, explicitly parallel, and conditional 
DFA. In fact, demonstrating the generality of the underlying pattern is a cen- 
tral concern of this article, rather than applying the frameworks to concrete 
DFA-problems. Thus, the presentation remains on purpose on a conceptual level 
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taking a user’s point of view in order to demonstrate that the DFA-designer is 
offered almost the same interface independently of the paradigm and the specific 
setting considered. Additionally, we put emphasis on the coincidence theorems 
underlying the concrete DFA- frameworks. They are the theoretical backbone 
providing the key to the overall approach. 

2 Intraprocedural Data-flow Analysis 

In this section we illustrate the essence of our approach by reconsidering the stan- 
dard framework for intraprocedural DFA of imperative programs (cf. [12,13]). 
Figure 3 shows the intraprocedural instance of the “abstract” framework of Fig- 
ure 1. 
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Fig. 3. The intraprocedural DFA-framework. 



Intraprocedural DFA is characterized by a separate and independent inves- 
tigation of the procedures of a program. Following [19] we assume that pro- 
cedures are represented as directed edge-labeled fiow graphs G = (N,E,s,e), 
whose nodes n G N represent program points, and whose edges e € E rep- 
resent the statements and the nondeterministic control fiow of the underlying 
procedure, and where s and e denote two distinct program points, the so-called 
start and end node of G. In this setting a local abstract semantics specifying a 
DFA-problem is a functional | ] : E^{C ^C) which gives abstract meaning to 
the statements of the procedure in terms of transformation functions on a set 
of data-flow facts, usually a complete lattice of finite height C} The following 



^ In [31] it is thus called a lattice framework. 
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straightforward extension of | ] to finite program paths p = (ei, . . . , e^), where 
Idc denotes the identity on C, is the key to the meet-over-all-paths globalization: 



lp]=df 



Idc if P is the “empty” path 

|(e 2 , . . . ,eg) ] o |ei ] otherwise 



Denoting the set of all program paths reaching a program point n by P[s,n], 
the MQP-solution with respect to a local abstract semantics | ] is defined by: 

The MQP-Solution: Vcq G C Vn G fV. MOP{n)=df H { |p](co) |p G P[s,n] } 



In contrast, the M_FP-solution is defined as the greatest solution of the following 
equation system: 



dfi(n) 



J Co if n = s 

1 n { |(m, n) ](dfi(m)) I m is a predecessor of n } otherwise 



Let dficQ denote the greatest solution of this equation system with respect 
to the start information cq. Then the MFP-solution is defined by: 



The MFP-Solution: Vcq G C Vn G fV. MFP{n) — dfico(n) 

The well-known (intraprocedural) Coincidence Theorem of Kildall [13], and Kam 
and Ullman [12] gives a sufficient condition for the coincidence of the MQP- 
solution and the MPP-solution in this setting. 

Theorem 1 (Intraprocedural Coincidence Theorem). 

The {intraprocedural) MFP- and MOP -solution coincide, if the local semantic 
functions of the abstract interpretation are all distributive f 

As an example we consider the availability of a program term t. This is a 
classical (cf. [11]) and practically relevant (cf. [22,32]) DFA-problem, where the 
set of data-flow facts is given by the lattice of Boolean truth values tt and ff 
with ff F tt. Intuitively, t is available at a program point n, if on every program 
path reaching n the last modification of one of t’s operands is followed by a 
computation of t. This is illustrated by the program of Figure 4(a). Figure 4(b) 
highlights the program points where a -I- 6 is available, and Figure 4(c) shows 
the program points where c -I- 6 is available. 

The DFA solving the availability problem is specified by the local abstract 
semantics 

{ Constu if Transp^{e)A Compf{e) 

Ids if Transp t (e) A ^ Compt (e) 

Const ff otherwise 

^ A function / : C^C is called distributive iff VC' C C. f{\~\C') = H {/(c) j c G 
C'}. It is called monotonic iff VC' C C. /(Hc') C H {/(c) jc G C'}. Hence, 
monotonicity is a weaker requirement than distributivity. For monotonic semantic 
functions the MFP-solution is a safe approximation of its MQP-counterpart, i.e., 
MFP C MOP; a fact, which holds for the other coincidence theorems given in the 
course of this article, too. 




From DFA-Frameworks to DFA-Generator 



367 




Fig. 4. Illustrating availability in the intraprocedural setting. 



where Ids denotes the identity, and Const u and Const ff the constant functions 
on {tt^ff}, respectively. Moreover, Comp and Transp are two local predicates de- 
fined for the statements of the procedure under consideration. They are true if t 
is computed along edge e, and if no operand of t is modified along e, respectively. 
Obviously, all semantic functions are distributive. Hence, the MQP-solution and 
the MFP-solution coincide, which proves the central step of verifying that the 
DFA-algorithm, which is automatically generated from this specification, pre- 
cisely computes the set of program points, where term t is available. 

Remark 1. Note that in abstract interpretation correctness has both a “horizon- 
tal” and a “vertical” aspect. The coincidence theorem above (and those consid- 
ered in the following sections) reflect the horizontal aspect: they are concerned 
with the precision of a fixed point solution (“MFP”) computed with respect 
to a reference solution (“MOP”) desired, which both refer to the same level of 
abstraction, which is fixed by the local abstract semantics they are sharing. In 
contrast, the vertical aspect concerns the correctness of the abstract semantics 
induced by its underlying local abstract semantics with respect to a reference 
semantics, usually the “concrete” program semantics or some other abstract 
semantics, e.g. the so-called static semantics (cf. [5]). We do not consider the 
vertical aspect here. It is an orthogonal issue, which cannot meaningfully be 
considered on the level of abstraction of the current presentation. 

3 Interprocedural Data-flow Analysis 

Figure 6 shows the interprocedural instance of the DFA-framework of Figure 1. 
Note that the internal structure of the framework and its corresponding gen- 
erator is more complicated. However, the user interface is almost the same: in 
comparison to the intraprocedural setting only a single component has been 
added. In the following we discuss the differences in more detail. 

Interprocedural DFA takes the semantics of procedure calls into account. 
For the interprocedural version of the MQP-approach this requires that only 
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interprocedurally valid paths are taken into account, i.e., paths respecting the 
call/return-behaviour of procedure calls (cf. [37]). Whereas the respective exten- 
sion of the intraprocedural MQP-approach is rather straightforward, the inter- 
procedural extension of the MFP-approach requires additional care. The key of 
this extension is a preprocess, which computes the semantics of procedure calls 
according to the local abstract semantics of the considered DFA-problem. This 
preprocess is realized by a second generic algorithm. As a consequence (cf. Figure 
6), the internal structure of the DFA-framework is more complicated than its 
intraprocedural counterpart. However, the specification interface and the proof 
obligations remain essentially the same. Only a return functional TZ is addition- 
ally required. It is the handle to properly deal with local variables of recursive 
procedures. Intuitively, the point here is that effects on global variables must be 
maintained after returning from a recursive call, whereas local variables must 
be reset to their values at call time. This is illustrated in the example of Figure 
5(a) using availability of program terms as example. While c -I- 6 is available at 
the program point following the recursive call of tti in procedure tti, a -I- 5 is not. 
The difference lies in the fact that in case of a -I- 6 a global operand is modified 
within the recursive call, while it is a local one for c-|-6. Return functions extract 
this information from the data-flow informations valid at call time and valid im- 
mediately before leaving the called procedure, which are stored in a DFA-stack 
mimicing the run-time stack. This is discussed in detail in [15,24,26]. 




Fig. 5. Illustrating availability in the interprocedural and parallel setting. 



The interprocedural variant of the intraprocedural coincidence theorem pre- 
sented next captures programs with mutually recursive procedures, global and 
local variables, and value and reference parameters [24,26].^ 

® Sharir and Pnueli were the first who presented an interprocedural extension of the 
intraprocedural coincidence theorem (cf. [37]). Their version, however, did not cap- 
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Fig. 6. The interprocedural DFA-framework. 



Theorem 2 (Interprocedural Coincidence Theorem). 

The interprocedural MFP- and MOP -solution coincide, if the local semantic 
functions and the return functions of the abstract interpretation are all distribu- 
tive. 

A collection of applications of the framework of Figure 6 including the inter- 
procedural counterpart of the availability problem considered in Section 2 can 
be found for example in [15] and [24]. 

4 Data-parallel and Object-oriented Data-flow Analysis 

The interprocedural machinery considered in the previous section can rather 
straightforwardly be enhanced to data-parallel languages like High Performance 
Fortran (HPF) [8], Fortran D [9], or Vienna Fortran [39], and to object-oriented 
languages like Smalltalk [10] or Oberon [38]. In fact, Figure 6 can be considered 
an illustration of both the data-parallel and object-oriented situation, too, and 
thus we do not present a separate figure here. In [20] this has been exploited 
for the data-parallel setting of HPF considering distribution assignment place- 
ment {DAP) as application. This is a new aggressive optimization which reduces 
communication costs in HPF-programs by eliminating partially redundant and 



ture local variables and parameters of recursive procedures. The version presented 
in [15] captures even procedural parameters. 
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partial dead (re-)distributionsd The DFAs required for DAP, which resemble 
the one for availability, were specified according to the pattern of Figure 6 of 
Section 3. In [18] and [25] the approach has been extended to an object-oriented 
setting set up by a Smalltalk- and an Oberon-like language, considering type 
analysis as application. In all these cases a coincidence theorem relating the 
computational and the specificational part of the framework is crucial. For illus- 
tration we here recall the coincidence theorem of the DFA-framework fitting to 
the object-oriented setting considered in [18]. 

Theorem 3 (The Object-oriented Coincidence Theorem). 

The object-oriented MFP- and MOP -solution coincide, if the local semantic 
functions {including the filter functions)^ of the abstract interpretation are all 
distributive. 

5 Parallel Data-flow Analysis 

In this section we consider explicitly parallel programs with interleaving seman- 
tics and shared memory. In this setting one is faced with the phenomena of in- 
terference and synchronization. Figure 5(b) illustrates their impact by opposing 
a sequential and a parallel program using the availability of a -I- 6 for demonstra- 
tion. Of course, parallel programs can equivalently be expressed by a sequential 
“product” program which make all the interleavings explicit. Though this would 
allow us to directly apply the results of intraprocedural DFA, it would not be 
of much practical use as the size of the product program is exponential in the 
number of parallel components: a dilemma often condensed to the catch-phrase 
“state explosion problem.” For the large and practically most important class of 
bitvector problems (cf. [11]), however, it has been shown that interleavings need 
not be considered at all to capture the effects of interference and synchronization 
(cf. [27,29]). This allows us a two-step approach similar as in the interprocedural 
case. The key of the MFP-approach of the parallel setting is a preprocess, which 
in an innermost fashion computes the semantics of parallel statements. As in- 
terprocedurally, the designer of a (bitvector) DFA need not to know any details 
of this process when applying the framework. The treatment of capturing inter- 
ference and synchronization can be encapsulated inside the framework and its 
corresponding generator. For the parallel setting we have the following version 
of the coincidence theorem [27,29]. It applies to bitvector problems. However, 
extensions to specific non-bitvector problems are possible (cf. [17]). 

Theorem 4 (Parallel Bitvector Coincidence Theorem). 

The parallel MFP- and MOP -solution for bitvector problems coincide. 

Following the pattern of Figure 7, all the bitvector analyses (e.g. availability 
and very busyness of terms, reaching definitions or liveness of variables), which 

^ In first practical measurements this optimization proved to be most powerful: often 
a speed-up of several hundred per cent have been observed (cf. [20,21]). 

® See Section 6. 
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Fig. 7. The parallel DFA-framework. 



have originally been developed in the sequential imperative paradigm (cf. [11]), 
can now be transferred to the parallel setting together with the optimizations 
based thereon. In [28] and [16] this has been demonstrated for code motion (cf. 
[22,32]) and partial dead-code elimination (cf. [23]), respectively. 

6 Conditional Data-flow Analysis 

Conditional branches are usually nondeterministically interpreted in DFA in or- 
der to avoid undecidabilities. The framework of abstract interpretation, however, 
is inherently powerful enough in order to also properly deal with conditional 
branching. Here, we demonstrate this for the intraprocedural setting. Techni- 
cally, this can be achieved by introducing filter functions of the form 

fc : C — > C defined by V c' S C. fc{c')=df c U c' 

where U is the lattice operator dual to the one for modelling the “merge” of data- 
flow information at join nodes of the control flow. Intuitively, a filter function 
matching the pattern above enriches the current data-flow information by the 
data-flow facts, which are guaranteed by the particular program branch taken. 
After introducing filter functions the DFA-process proceeds as in the intrapro- 
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cedural case. However, due to the special nature of the filter functions we need 
here the following version of the coincidence theorem. 

Theorem 5 (Conditional Coincidence Theorem). 

The conditional MFP- and MOP -solution coincide, if the lattice, the local se- 
mantic functions, and the filter functions of the abstract interpretation are all 
distributive. 

In [18] and [25], an alternative variant of filter functions have been introduced, 
aiming at achieving an “almost” deterministic treatment of program branches 
and method calls. Basically, the filters introduced there propagate data-fiow in- 
formation (i.e., type information) only along program branches they are qualified 
for. Thus, in contrast to the filter functions sketched above, they do not enrich 
data-fiow information according to the branching condition, but act like a sieve 
letting pass only those parts of the information satisfying the (abstractly in- 
terpreted) branching condition. This is discussed in detail in [18,25]. For the 
purpose of this article it suffices that these filters fit to the general pattern of 
Figure 1, which can be considered an abstraction of the corresponding instance 
of the object-oriented setting. 

7 Conclusions 

The origins of DFA-frameworks based on abstract interpretation lie in the im- 
perative programming paradigm with the main focus on intraprocedural and in- 
terprocedural DFA. In this article we reconsidered this approach from a generic 
point of view. We showed that the resulting generic framework has instances 
in quite different programming paradigms ranging from the classical imperative 
over the parallel and data-parallel one to the object-oriented paradigm, which 
are becoming more and more important in practice. From the perspective of 
a DFA-designer this unifying approach simplifies to switch between paradigms 
as well as to transfer analyses beyond paradigm boundaries. Moreover, as a 
by-product of our approach, we obtained a natural decomposition of the DFA- 
frameworks into a “theoretical” and “practical” part suggesting a uniform prin- 
ciple for the construction of DFA-generators. In each case the backbone of the 
decomposition is a specific coincidence theorem relating the solution computed 
by a DFA-algorithm to the solution specified. According to this principle DFA- 
generators (or tool kits) have already successfully been realized for intra- and 
interprocedural DFA, e.g., in terms of the DFA&OPT-METAFrame tool kit [14], 
and in similar form in the DFA-generator systems PAG (cf. [2]) and OPTIMIX 
(cf. [3,4]). As demonstrated here, these approaches can uniformly be extended to 
further paradigms and settings. An extension to parallel programs is integrated 
in the tool kit of [14], further extensions are in progress. 
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Abstract. We present a theorem-prover based analysis tool for object-oriented 
database systems with integrity constraints. Object-oriented database specifica- 
tions are mapped to higher-order logic (HOL). This allows us to reason about 
the semantics of database operations using a mechanical theorem prover such as 
Isabelle or PVS. The tool can be used to verify various semantics requirements 
of the schema (such as transaction safety, compensation, and commutativity) to 
support the advanced transaction models used in workflow and cooperative work. 
We give an example of method safety analysis for the generic structure editing 
operations of a cooperative authoring system. 



1 Introduction 

Object-oriented specification methodologies and object-oriented programming have be- 
come increasingly important in the past ten years. Not surprisingly, this has recently led 
to an interest in object-oriented program verification in the theorem prover commu- 
nity, mainly using higher-order logic (HOL). Several different approaches to modelling 
object-oriented features in HOL have been presented [13,8]. These approaches empha- 
sise the methods and behaviour of a single object. Lor an object-oriented database, a 
different viewpoint is needed; a database typically includes integrity constraints over 
collections of objects that have a lifetime beyond an application program. Operations 
on the database transform it from one consistent state to another. In this paper, our point 
of view is the database state itself, and the persistent collection of objects it contains. 
We give a formal model for a persistent object store in HOL, which simulates the type- 
tagged memory structure of an implementation. This model is sufficient to describe the 
operational semantics of the typical features of an object-oriented database program- 
ming language, such as heterogeneous collections, inheritance, late binding, and nil 
values. 

Many recently proposed database systems rely on transaction models that require 
designers to provide various assertions about the semantics of their schema. Examples 
include (but are not limited to), consistency requirements (i.e., a method/transaction 
has to preserve a number of static integrity constraints) [4], the correctness of undo 
methods (i.e., for each method another method has to be specified which compensates 
the effects of the method) [9], and commutativity tables (i.e., for each method pair, 
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it has to be specified when two methods commute) [17]. Such knowledge about the 
semantics of a schema is used by so-called advanced transaction models to provide more 
flexible mechanisms for concurrency control [12]. This is essential for many modern 
applications of database technology, such as workflow management and cooperative 
work. It is often assumed that database designers provide the required knowledge about 
the schema. This, however, is problematical, since people will make mistakes in their 
specifications: a seemingly trivial line of code, such as a nil-check, is easily forgotten, 
and it may lead to inconsistency of persistent data. 

In this paper, we describe a tool which can assist database designers to verify 
the correctness of assertions about an object-oriented database schema. We outline a 
method (transaction) safety analysis framework based on theorem proving in higher- 
order logic (HOT). We show how to adapt the Isabelle/HOL theorem prover [10] for 
this task. We first define a general Isabelle theory of object-oriented systems. Using this 
theory, we show (1) how specific object-oriented schemas can be encoded in HOT, and 
(2) how proofs about these schemas can be performed using the Isabelle system. 

The paper is organised as follows. An overview of our analysis framework, includ- 
ing a brief introduction to the Isabelle system, is given in Section 2. Section 3 introduces 
a database specification language, called OASIS, and a case study. Section 4 discusses 
the formal model of the persistent object store, in HOT, which is used to encode the se- 
mantics of specific database schemas. This model includes a number of generic (higher- 
order) operations and theorems about their combination for term-rewriting. The actual 
representation of database-specific schema information in terms of these operations is 
discussed in Section 5. We show how typical object-oriented language features, such 
as heterogeneous collections, methods, late binding, and transactions, can be encoded. 
Section 6 shows how to extend the Isabelle tools to assist in reasoning about these 
schema representations. An example proof is discussed, which uses the framework for 
transaction safety analysis. Section 7 discusses related work on object-oriented analysis 
that makes use of theorem prover technology. Section 8 gives a summary and discusses 
future work. 



2 Architecture of the OASIS tool 



Our schema specification language is called OASIS (for Object AnalySIS). It includes 
facilities for constraint and query definition, object manipulation, and transaction defi- 
nition. The features of the object manipulation language are common to object-oriented 
database technology (e.g., late binding, inheritance, and heterogeneous collections [1]). 
The structure of the OASIS tool is similar to that of the LOOP tool [8]. The OASIS 
specification language is mapped by a schema translator to a simple formal model of 
objects in higher-order logic (HOL). This model resembles the type-tagged memory 
structure of an implementation and is sufficient to describe the operational semantics of 
the specification language. The reasoning component of the tool is implemented using 
the higher-order logic incarnation of the Isabelle theorem prover [10]. The two major 
components of the OASIS tool are described in more detail below: 
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Extended Isabelle Theorem Proven Isabelle is an open system, implemented in ML. Its 
HOL theory provides a formal theory of the standard data types one finds in databases, 
such as booleans, integers, characters, strings, tuples, lists, and sets. To reason about 
heterogeneous collections of objects with shared subcomponents, we extend these stan- 
dard theories with a Generic 00 Theory that simulates the type-tagged memory of 
an implementation. The theory defines functions that describe the effects of primitive 
update and retrieval operations on the object-store (e.g., attribute update and attribute 
selection). We have derived a number of theorems about the interactions of these oper- 
ations, which are used for the analysis of database methods and transactions. 

Schema Translator. The schema translator is directly implemented in ML. It maps a 
specific object-oriented schema to a low-level representation, defined in terms of the 
operations of the generic theory of objects. The input is an OASIS schema, in ascii 
form, which is parsed and converted to an internal abstract syntax tree in ML. The 
output consists of two Isabelle files: a file with extension ‘ . thy’ that gives definitions 
for the database- specific class structures, methods, transactions, and constraints; and an 
ML file with extension ‘ .ML’ that contains some standard lemmas about the schema. 
These files can be loaded into an Isabelle session, and proofs about the schema can be 
initiated. 



3 An example OASIS specification 

An OASIS database schema consists of a number of class definitions, named persis- 
tent roots, integrity constraints, and transactions. Classes (which can also be abstract) 
contain definitions of methods, written in a simple procedural update language. OASIS 
supports single inheritance. Persistent roots provide named entry points to the database; 
they can be used as global variables in methods, transactions, and integrity constraints. 
OASIS provides facilities for associating constraints with a schema. These constraints 
are boolean-valued query expressions over the database state. For queries, we use OQL 
(Object Query Language) [5]. 

Figure 1 shows part of a generic graph schema, which includes some basic struc- 
ture editing operations. This example is based on the implementation of the SEPIA 
document authoring system [16]. The schema defines abstract classes for Elements and 
Nodes. Atomic nodes (class ANode) and composite nodes (class CNode) are concrete 
classes, which are extensions of class Node. Link is another concrete class, which ex- 
tends the abstract class Element. If a class does not extend any other class, it implicitly 
extends the abstract class Object. The schema defines three named persistent roots: 
cnodes, links, and anodes. These are analogous to the attributes of the main class in 
object-oriented programming languages. 

Eigure 1 also gives some example integrity constraints over the contents of the per- 
sistent roots. Constraints ci, C 2 , and C 3 assert non-nil requirements. Constraint C 4 asserts 
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abstract class Element { 
attribute string name ; 
attribute int position; 

abstract boolean isConnectedTo (Element n) ; 

}; 

abstract class Node extends Element { 
attribute set<Link> inLinks; 
attribute set<Link> outLinks; 

}; 

class ANode extends Node { 

attribute set<string> content; 

}; 

class CNode extends Node { 
attribute int size; 
attribute set<Element> elements; 

CNode(string s, int p, int z) ; 
boolean removeNodeOrLink (Element n) ; 

CNode createCNodeIndnt p, int z, string s) ; 

}; 

class Link extends Element { 
attribute Node from; 
attribute Node to; 

}; 

name set<CNode> cnodes; 
name set<Link> links; 
name set<ANode> Euiodes; 
constraints { 



cl 


: forall 


n in cnodes 


n!=nil and (forall e in n. elements 


e ! =nil) ; 


c2 


: forall 


n in links : 


n ! =nil ; 




c3 


: forall 


n in anodes 


n!=nil; 




c4 


: forall 


cn in cnodes 







forall e in cn. elements : 

((e instanceof Link) implies 

(( (Link) (e) .from in cn. elements) and 
((Link) (e) .to in cn. elements) )) ; 
c5 : forall nl in cnodes : forall n2 in cnodes : 

nl == n2 or (forall n in nl. elements: not(n in n2 . elements) ) ; 

}; 



Fig. 1. Classes, Persistent Roots, and Constraints of the SEPIA Schema 
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that all links in a composite node should link nodes within that same composite node. 
Constraint C 5 asserts that elements are nested within at most one “parent” CNode object. 

The command language we use consists of a small number of commonly used con- 
structs. Atomic updates are object creation, and variable and attribute update. There 
is no object deletion, because persistence by reachability is used (as in Java and the 
02 database system [ 1 ]): that is, an object is in the database as long as it is directly 
or indirectly reachable from one of the roots. Compound commands are formed using 
sequential composition, bounded iteration, conditional branch, collection iteration, and 
(at present) non-recursive update method call. 

Method bodies are defined using a command statement. A method can apply up- 
dates to the receiving (i.e., this) object, as well as to the objects referenced by this, 
the persistent roots, and the attributes of objects passed in as actual parameters. An 
OASIS schema also declares a number of named transactions. Transactions are simi- 
lar to methods, but there is no receiver object. A transaction typically executes a se- 
quence of method applications. Traditionally, the notion of database integrity is tied to 
database transactions, but our system also allows one to verify integrity at the method 
level (which is often preferred). In this paper, we focus on methods rather than transac- 
tions. 

Figure 2 gives some example method dehnitions for the schema. Method removeN- 
odeOrLink on composite nodes will be used as an example in later sections. This 
method removes an Element from the elements component of the receiver CNode ob- 
ject, provided that it is not connected to any other Element (within the same CNode). 
This condition is tested by applying the abstract method isConnectedTo of class El- 
ement, which has different concrete implementations in classes Node and Link. Late 
binding selects the appropriate implementation of the method, based on the run-time 
type of the receiver object. Method removeNodeOrLink respects the integrity con- 
straints on the schema. In Section 6 , we show how the OASIS system proves this au- 
tomatically. Constraint C 4 is non-trivial with respect to this method; both address the 
elements attribute of a CNode. 



4 A generic Isabelle theory of objects 

Isabelle specifications are called theories. A theory consists of a collection of defini- 
tions and axioms. Our system extends the default collection of Isabelle/HOL data type 
theories that are available. In this section, we define a generic theory of objects, which 
describes schema-independent knowledge about object-oriented databases. Database- 
specific knowledge can be expressed in terms of this theory (this is the subject of Sec- 
tion 5). Isabelle/HOL syntax is similar to ML syntax and is for the most part self-expla- 
natory. We give annotations to clarify its more cryptic symbols. 

The database state (object store) is modelled as a partial function from object iden- 
tifiers to values. In Isabelle, we represent such functions using the predefined ‘option’ 
data type, as ‘oid => ’b option.’ Isabelle/HOL function types (=>) are total; par- 
tial function types can be modelled using options. The option data type includes the 
constructors None (to represent undefined function results) and Some (to represent de- 
fined function results — the actual value is supplied as an argument). The type variable 
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CNode CNode : : createCNodeInCint p, int z, string s) ■[ 
var n : CNode { 

n = new CNode(s, p, z) ; elements += set(n); cnodes += set(n) 

} returns (n) } ; 

booleaui CNode: :removeNodeOrLink (Element n) { 
if (n != nil) auid (n in elements) auid 

(forall X in elements : not (x . isConnectedTo(n) ) ) then { 
elements -= set(n) } returns (true) 
else { skip }■ returns (false) }; 
booleaui Node :: isConnectedTo (Element n) •[ 

(n in inLinks) or (n in outLinks) }; 
booleaui Link: : isConnectedTo (Element n) •[ (from == n) or (to == n) }; 



Fig. 2. Example Method Definitions for the SEPIA Schema 



/? (written ’b) in the co-domain type of ‘o id => ’b option’ will be instantiated with 
a concrete type that describes the schema- specific class structures (see Section 5). The 
type of object identifiers (oid) is defined as a datatype, which we omit here. 

On this abstract notion of database state, we define several higher-order functions 
for database retrieval and update. Figure 3 lists these operations with their signatures. 
These functions are modelled as schema-independent operations, which take (functions 



olds :: (oid => (3 option) => oid set 
eval :: [(3 option,/? => bool] bool 
get :: [(3 option,/? a] ^ a 

set :: [(oid => (3 option), oid, /? /?] => (oid => (3 option) 

smash :: [(oid => f3 option), (oid => (3 option)] => (oid => (3 option) 

apply :: [a set, [a, oid] ^ f3 option] => (oid => (3 option) 

new :: [oid, /?] => (oid => (3 option) 

skip :: (oid => (3 option) 



Fig. 3. Generic Operations on Objects 



as) parameters to make them specific. The operations oids, get, and eval are used 
to retrieve information from the state. For example, the operation get is used for the 
translation of attribute selection. The other operations in the figure are used to update 
the state; they result in a “little” object store (called a delta value [6]), which comprises 
local changes to the state. For example, the operation set is used for the translation of 
attribute assignment. The smash operation is used to encode sequential compositions 
(‘;’) of commands. It is defined as a functional override, where the bindings in the 
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second argument take precedence over those in the first. The smash operation is also 
used to apply method changes to the object store. 

Isabelle can be used to prove abstract properties (theorems) about the operations in 
Figure 3, based on their definitions in HOL. At present, the generic theory of objects 
includes 49 theorems. First-order rules are derived for the associativity and reflexivity 
of smash. Second-order rewrite rules (with functions in arguments) are derived for ap- 
plications of eval and get to modified object store values. Below, we give an example 
of one of these theorems (rule ri): 

get ((smash osl (set os2 idb f)) ida) g = 

(if idb=ida & idb:oids os2 then get (os2 ida) (g o f) 
else get (osl ida) g) 

This rule illustrates how a get operation is “pushed through” an updated object store. 
Such theorems are used as rewrite rules during proofs, in a left-to-right manner. 



5 Modelling database-specific knowledge 

The OASIS schema translator supplements the generic theory discussed in the previous 
section with database-specific information. For an input database schema, the schema 
translator generates an Isabelle ‘ . thy’ file that contains the database-specific HOL def- 
initions of class structures, methods, transactions, and integrity constraints. In effect, 
the schema translator implements a semantics mapping, where the output is HOL no- 
tation. The schema translation has been defined and implemented for all of the OASIS 
constructs we show in this paper (as well as a few others, such as foreach, which we do 
not discuss here). 

The previous section introduced an abstract notion of database state as a partial 
function from oids to values of generic type ’b. For a specific database schema, the 
type variable ’ b should be instantiated with type information that reflects the database- 
specific class hierarchy. This is done using a data type definition: 

datatype object = ANode string int (oid set) (oid set) (string set) 

I CNode string int (oid set) (oid set) int (oid set) 

I Link string int oid oid 

The above data type is a disjoint union type, with a case for each of the concrete classes 
in the schema; the abstract classes Element and Node are not included, because they do 
not have concrete instantiations. Structural information of objects (i.e., attribute values) 
is supplied as an argument to the data type constructors. This information includes all 
attributes inherited from superclasses. Class references in compound objects appear as 
“pointer” references in the form of oid-values. This accommodates object sharing and 
heterogeneous sets: representations of objects from different classes can be grouped in 
one and the same set, since they all have the same Isabelle type oid. 

The constructors of type object provide for the required run-time type information. 
In object-oriented systems with inheritance, this information is needed to model run- 
time type-based decisions, such as late-binding. Using our Isabelle representation, these 
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decisions can be conveniently encoded using case-splits to examine the type tag. The 
following sections show how to encode OASIS features in terms of the generic theory 
of objects, enhanced with schema-specific information. 

Queries and constraints. The schema translator maps OASIS query expressions to 
functions in Isahelle/HOL. These functions take the input object store as an argument. 
The Isabelle predefined data types support most commonly used OQL query language 
constructs [5]. For example, set expressions in OQL (e.g., union, select-from-where, 
except, and intersect) are available in the Isabelle syntax. The translation of most OQL 
expressions is straightforward. However, the translation of operations on objects (e.g., 
attribute selection and nil comparisons) is complicated by the introduction of object 
identifiers. For these constructs, explicit lookups on the object store are needed. We en- 
code these using the generic retrieval operations get and eval of the theory of objects. 

To represent nil comparisons in Isabelle, we make use of the function eval. For 
example, the expression ‘n!=nil’, where n is of type Node, amounts to a check that n is 
in the object store, with the right type. The following Isabelle code accomplishes this: 

eval (os n) ("/val. case val 

of ANode name position inLinks outLinks content => True 
I CNode name position inLinks outLinks size elements => True 
I Link name from to => False) 

The expression (os n) looks up the object-typed value associated with oid n. The 
second argument to eval is a boolean-valued function (the symbol 7. is HOL syntax 
for A-abstraction). This function returns True if the type tag on the value is ANode or 
CNode; otherwise, if n does not have a binding in os, or is bound to a Link value, then 
False is returned. In the examples, we abbreviate the case-split function with a name, 
such as isNode for the above. 

Attribute selections are coded using the get operation. For example, the OASIS 
expression ‘n. elements’, where n is of type CNode, is represented as follows: 

get (os n) ("/.val. case val 

of ANode name position inLinks outLinks content => arbitrary 
I CNode name position inLinks outLinks size elements => elements 
I Link name from to => arbitrary) 

Observe that an arbitrary value is returned for the wrongly typed cases; this is a 
common way of dealing with undefined function results in HOL [7]. 

Constraints are boolean-valued queries. Constraint C 4 of the Sepia schema is repre- 
sented in Isabelle as follows: 

c4 os cnodes links anodes == 

! cn: cnodes. ! e:(get (os cn) elementsOf) . 

(eval (os e) isLink) — > 

((get (os e) f romOf ): (get (os cn) elementsOf)) k 
((get (os e) toOf): (get (os cn) elementsOf)) 

In Isabelle syntax, the fora 1 1 quantifier is written as ‘ ! ’. The type cast in the original 
constraint falls away in the translation to HOL. 
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Update methods, late binding, and transactions. Update methods are represented as 
named functions in HOL. Such functions map an input object store, persistent roots, an 
oid this, actual parameter values and any required new oids to a tuple. The tuple includes 
the modifications to the object store, persistent roots, and method parameters; the return 
value of the method is given in the last position of the tuple. The removeNodeOrLink 
method of class CNode has the following HOL representation: 

CNode_removeNodeOrLink os cnodes links anodes this n == 

if (eval (os n) isElement) k n: (get (os this) elementsOf) k 
(! x: (get (os this) elementsOf). 

(if (eval (os x) isLink) 

then Link_isConnectedTo os cnodes links anodes x n 
else Node_isConnectedTo os cnodes links anodes x n)) 
then (set os this /, True) 
else (skip, False) 

The right-hand side is a conditional expression that reflects the structure of the original 
method body. Within the conditional, the application of the isConnectedTo method to 
element object ‘x’ in the if-clause involves late binding: based on the actual run-time 
type of ‘x’, the correct implementation of the method is applied. In our framework, such 
a run-time type-based decision is easily expressed using an if-then-else clause, and 
the eval predicate. The inner conditional expression yields a boolean value, which is 
negated with the operator ‘ ~ . It is important to realise that nothing is computed by a 
conditional expression; it is only used as an assumption in the then and else branches 
of the proof. 

The hrst component of the tuple returned by the then branch is a set expression, 
which describes the effects of the assignment to the elements attribute of the this object, 
in an algebraic manner. The function/ abbreviates a case-split for the actual update: 

("/,val. case val of ANode name position inLinks outLinks content => 

ANode name position inLinks outLinks content 
I CNode name position inLinks outLinks size elements => 

CNode name position inLinks outLinks size (elements - {n}) 

I Link name position from to => Link name position from to) 

The second component of the tuple is the return value of the method, which is a boolean 
value. We omit changes to the persistent roots and parameters in the above example. 

Our schema translator generates less “efficient” code than that shown above; this is 
inherent in automatic code generation. However, we easily obtain the above simplified 
form, using term rewriting (see Section 6). 

A transaction is not the same as a method: a transaction is a sequence of updates, 
whose changes are not propagated to the database until the transaction commits. A 
transaction is further distinguished by not having a receiver object. Transaction seman- 
tics is provided by applying an additional smash to the input object store and the delta 
value that represents the transaction body’s updates. A method can be “lifted” to the 
transaction level by putting code to lookup the receiver object in the transaction, and 
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then applying the method. The next section uses an example in which we give transac- 
tion semantics to the removeNodeOrLink method. 

6 Using the system 

The OASIS tool currently provides support for automated transaction safety analysis. 
The tool implements an automated proof strategy, which is comprised of the following 
four successive steps: (i) specification of an initial proof goal; (ii) normalisation of the 
goal using rewriting; (iii) safe natural deduction inference steps; and (iv) exhaustive 
depth-first search. This strategy can verify many non-trivial combinations of transac- 
tions and constraints, although the search is inherently incomplete [2]. The automated 
proof procedure returns any goals that it cannot solve. We now explain in detail each of 
these steps. 

Starting a transaction safety proof. To start a transaction (or method) safety proof, an 
Isabelle proof goal should first be constructed. Our schema translator dehnes the ML 
functions start_proof and method_saf ety_goal, which automate this process for 
a given method and constraint. For example, to verify that method removeNodeOrLink, 
defined in class CNode, is safe with respect to constraint C 4 , we type the following: 

- start_proof (method_safety_goal("removeNodeOrLink" , "CNode" , "c4" , ["cl"])) ; 

Verification of a method or transaction with respect to an individual constraint predicate 
may depend on additional constraints on the schema. In this example, constraint ci is 
necessarily assumed, since in order to extract the elements attribute from a CNode 
object, that object must be non-nil. Additional assumptions are given as parameters to 
the start_proof command. Isabelle now responds with the following initial proof 
goal: 

Level 0 



(eval (os this) isCNode) & 

c4 os cnodes links anodes & cl os cnodes links anodes — > 

(let (delta, result) = 

CNode_removeNodeOrLink os cnodes links anodes this n 
in c4 (smash os delta) cnodes links auiodes) 

The goal is in the form of an implication, where the constraints are assumed to hold 
in the initial state os (as seen in the premise); the conclusion is in the form of a let 
expression, which substitutes the modifications resulting from the method application 
into the constraint expression. Recall that our running example ignores modihcations to 
the persistent roots. Observe that the new database state in which the constraint is eval- 
uated takes the form (smash os delta). The smash “implements” the transaction- 
level commit of the changes in the little object store delta to the input object store os, 
as mentioned in Section 5. 
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Normalisation of the proof goal. The actual proof starts by unfolding the database- 
specific definitions (of methods, constraints, and transactions) in the initial goal. This 
is done using the Isabelle Simplifier. The Simplifier performs term-rewriting with a 
set of theorems of the following form: [ I iTi ; • • • ; I ] ==> LHS = RHS. Such 
theorems are read as conditional rewrite rules: a term unifying with the expression on 
the left-hand side of the equality sign {LHS) is rewritten to the term that appears on the 
right-hand side {RHS), provided that the hypotheses {Hi, . . . , Hn) hold. The default 
Isabelle Simplifier installs a large collection of standard reduction rules for HOL; new 
rules are easily added to customise the Simplifier to particular tasks. We have extended 
the Simplifier by adding a number of rewrite rules for simplifying expressions involving 
the constructs of the generic theory of objects. In addition to these, the ‘ .ML’ file that 
is generated by the schema translator asserts all database-specific definitions as rewrite 
rules. Thus definitions are automatically unfolded by the normalisation step. 

Unfolding the database-specific dehnitions rewrites the initial goal into a more com- 
plex form, in which every occurrence of the input object store os in the goal’s conclu- 
sion is replaced by an expression that reflects the modifications to os. During normali- 
sation, one of the subterms for the example is: 

(get ((smash os (set os this /)) e) fromOf) : 

(get ((smash os (set os this /)) on) elementsOf ) 

This subterm represents the condition ‘e.from in cn. elements’ (in constraint C4), in 
the context of the updated object store. At this point, patterns such as the above can 
be reduced using the rewrite rules of the generic theory of objects. The above term is 
rewritten (in several steps) to: 

(get (os e) f romOf ): (if na=this then (get (os cn) elementsOf )-{n} 
else (get (os cn) elementsOf) 

The rewriting “pushes” the attribute selection through the algebraic update operations 
(smash, set). For example, the update of the elements attribute is irrelevant with re- 
spect to the selection of the from field. This is identified by the Simplifier by application 
of rule ri from Section 4. Observe that, in the result term, all attribute selections are ex- 
pressed directly in terms of the input object store. 

During the normalisation phase, constraints that are irrelevant with respect to a part 
of the proof goal can be detected. (For example, straightforward term rewriting can 
already prove that method removeNodeOrLink does not interact with constraint C2.) 
The example proof above requires more analysis, because updates are applied to the 
same parts of the database (i.e., the elements attribute). 

Safe natural deduction inference steps. In addition to term rewriting with the Simplifier, 
Isabelle also uses natural deduction. Its Classical Reasoner uses a set of introduction 
and elimination rules (i.e., theorems) for higher-order logic to automate natural de- 
duction inferences. The default configuration of the tool includes machinery to reason 
about sets, lists, tuples, booleans, etc. The tool implements a depth-first search strat- 
egy; variables introduced by the use of quantifiers can be automatically instantiated, 
and backtracking is performed between different alternative unifiers. The tool requires 
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a distinction to be made between so-called safe and unsafe rules. Safe rules can be 
applied deterministically; they do not introduce or instantiate variables, so there is no 
need to undo any of these steps at later stages in the proof. For example, introduction of 
universal quantification is safe, whereas its elimination is unsafe. Safe steps get rid of 
trivial cases. The Classical Reasoner interleaves these steps with further simplification. 

As we did for the Simplifier tool, some extensions have to be made to the Classical 
Reasoner. The extensions include a database-specific rule for the introduction (and its 
converse rule for elimination) of the predicate eval. These rules (and their proof scripts) 
are generated automatically by the OASIS schema translator and reside in the ‘ . ML’ file; 
they provide a mechanism for case-based reasoning for the database-specific object 
type. For example, for an expression of type Node, cases are generated for types ANode 
and CNode; simplification immediately discards the other cases, which are irrelevant. 
Applying safe inference steps to our example goal generates a list of 12 subgoals. These 
goals require more in-depth analysis. 

Exhaustive depth-first search. Once the safe steps have been performed, any remaining 
goals are subject to an exhaustive depth- first analysis [7]. Safe inference steps are now 
interleaved with unsafe steps. This may involve backtracking, and undoing of unifica- 
tion steps. Isabelle allows a limit to be imposed on the search depth. This guarantees 
termination of the search tactics. In our practical experiments, a depth of 2 was suffi- 
cient for most cases. 

Steps (ii) to (iv) of the automated proof strategy are packaged as a single Isabelle 
tactic (oasis_tac, which is a customization of Isabelle’s auto_tac). A tactic is a 
proof procedure (i.e., proof instructions for the system) that may implement a heuristic. 
The oasis_tac tactic takes as a parameter a limit on the search depth. Calling this 
tactic with a depth of 2 on the example’s initial goal produces the following output: 

> by (oasis_tac (clasetO) (simpsetO) 2); 

Applying simplification steps... 

Applying safe inference steps... 

Now trying : 12... Done! 

Now trying : 11... Done! 



No subgoals ! 

The oasis_tac tactic automatically finds the required proof, using exhaustive depth- 
hrst search. Isabelle prints the just-proved theorem (omitted from the output), and the 
message “No subgoals I ’’ □ 

Practical results. The OASIS schema translator consists of approximately 2029 lines 
of ML code. At present, the generic 00 theory is 632 lines of Isabelle/HOL code, and 
49 theorems. The input SEPIA schema currently includes 6 class definitions, 18 method 
definitions, and 5 constraints.' The Isabelle/HOL theory and ML files generated for this 
schema comprise 162 lines of code. 

Table 1 shows experimental results for verifying the safety of two methods of class 
' Only parts of the SEPIA schema are shown in this paper. 
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Method 


Constraint 


Proof Time 


CNode: : removeNodeOrLink 


Cl 


3.35s. 


CNode: : removeNodeOrLink 


C2 


1.04s. 


CNode: : removeNodeOrLink 


C3 


1.06s. 


CNode: : removeNodeOrLink 


C4 


161.77s. 


CNode: : removeNodeOrLink 


cs 


109.63s. 


CNode : : createCNodeIn 


Cl 


10.93s. 


CNode : : createCNodeIn 


C2 


2.89s. 


CNode : : createCNodeIn 


C3 


2.88s. 


CNode : : createCNodeIn 


C4 


222.46s. 


CNode : : createCNodeIn 


Cs 


551.49s. 



Table 1. Some Experimental Results for Method Safety 



CNode, with respect to the constraints in Figure 1. All proof times are in seconds, with 
Isabelle running on a SUN 296 MHz Ultra-SPARC-II, under Solaris. The times given 
are only a rough guide of the efficiency of the automated method safety proofs. The 
times indicate that the trivial proofs are immediately solved by the theorem proven For 
example, the combination of constraint ci and method removeNodeOrLink operate on 
different attributes. As discussed in the previous section, the proof is trivial and is done 
using straightforward term rewriting, by the Simplifier. The real power of the theorem 
prover reveals itself in the cases where the constraint and method operate on the same 
attributes and/or persistent roots. For example, the combination of constraint C 4 and 
method removeNodeOrLink (illustrated in the previous sections) takes 161.77 seconds. 
In this case, the proof involves many tedious steps. 

7 Related work 

Theorem prover techniques have been applied in the context of relational databases 
using formalisms such as Boyer-Moore logic [14] and Hoare logic [11], for the veri- 
fication ([14]) and deductive synthesis ([11]) of transactions that respect a number of 
static integrity constraints. Our work shares similarities with these approaches, but it 
is based on an object-oriented framework and uses a modern theorem prover. At the 
time the above authors published their work, theorem prover technology was still in an 
early stage of development. For example, in [14], higher-order extensions are made to 
a first order theorem prover, and standard data types such as natural numbers and sets 
are defined from scratch. Nowadays, these modelling capabilities are available “off the 
shelf,” using a standard HOL theorem prover. 

Within an object-oriented database framework, Benzaken et al [3] study the prob- 
lem of method verification with respect to static integrity constraints, using abstract 
interpretation. A tableaux reasoner is used to analyse some properties of application 
code using first-order logic. However, important issues such as transactions, type infor- 
mation, and object sharing are not addressed. 

Theorem prover techniques that use higher-order logic are applied in the context of 
object-oriented programming in [13,8]. Santen [13] uses Isabelle/HOL to reason about 
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class specifications in Object-Z. A trace semantics is encoded to support reasoning 
about behavioural relations between classes. Jacobs et al study the verification of Java 
code, using the PVS theorem prover [8]. A tool called LOOP (Logic of Object-Oriented 
Programming) translates Java classes into the higher-order logic of the PVS system. 
The semantics of their approach is based on coalgebras, in particular to support proofs 
about refinement relations. Jacobs et al address a number of issues that we do not, such 
as exceptions, termination, and recursion. In contrast to the work on object-oriented 
programming, we study database transactions on a persistent object store, rather than 
the behaviour of individual objects. 

The work in this paper extends our previous work ([15]) hy considering additional 
topics such as inheritance and heterogeneity. Here, emphasis is placed on modelling an 
object-oriented database schema in HOL, and on the extensions to the Isahelle system 
to provide automated reasoning for such a database schema. We huild on the ideas of 
Doherty and Hull [6] in which database state changes are encoded as delta values (a 
difference between database states). In their work, delta values are used to describe 
proposed updates in the context of cooperative work; whereas in our work, delta values 
are used to cope with intra- transaction parallelism due to set-oriented updates. 



8 Conclusions and future work 

We have shown how to represent the constructs of an object-oriented database speci- 
fication language in the higher-order logic of the Isabelle theorem prover. To achieve 
this, we defined an Isabelle theory of objects, which resembles the type-tagged memory 
of a persistent object store. The constructs of the specification language are defined as 
generic higher-order operations in this theory. Higher-order logic allows us to achieve 
schema-independent reasoning: we have proved theorems about the generic operations 
that are used in reasoning about specific database operations. 

We presented some of our experimental results on the static analysis of database in- 
tegrity. The example proof shown in Section 6 involves a combination of typical object- 
oriented features (namely, heterogeneous collections, abstract methods, down-casting, 
late binding, and nil references). This example is representative of the interaction of lan- 
guage features encountered in many object-oriented applications. The example schema 
we are working with is based on the generic graph editing functionality of a real system 
(the SEPIA system [16]). All 90 method safety requirements in the case study could 
be verified automatically, using the Isabelle tool. It is worth mentioning that our initial 
specification contained a few bugs, such as forgotten nil-checks. These kinds of errors 
in the schema are easily overlooked by the specifier, but immediately spotted by the 
theorem prover. 

Our tool is not limited to transaction safety analysis. Because the theory used by the 
tool is based on very general semantics properties of the update language, we expect 
our experimental results to be extendible to the kinds of proof requirements encountered 
in other application areas, where reasoning about the semantics of database operations 
is needed. We are currently looking at applications of the OASIS reasoning tool in 
the areas of workflow and cooperative work, for the verification of e.g., compensation 
requirements (that is, proofs that one method compensates the results of another). 
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Abstract. The results presented here are based on the experience of de- 
velopment and application of DYANA - an environment for analysis of 
multiprocessor computer systems operation. The architecture and basic 
features of such an environments are discussed. Main problems of such 
the environment design are highlighted and possible solutions are shown. 
The key features of the DYANA environment are: the possibility of both 
quantitative and algorithmic analysis of system to be modeled; the time 
complexity estimation subsystem which helps to avoid the instruction- 
level simulation of target computer system; support of program develop- 
ment through simulation. 



1 Introduction 

Usually, simulation is a significant stage of a product’s life cycle. More complex 
the product is, more substantial the simulation stage is in the life cycle. 

The suitability of simulation modelling from the viewpoint of software devel- 
opment manufacturability depends on answers on two questions: 

— to which extent the transition from the model to the product itself is simple 
and efficient? 

— how manufacturable the process of simulation model creation and investiga- 
tion is? 

In other words, how the process of model creation and investigation ’fits’ into 
the process of product development? 

From the viewpoint of model-to-product transition, it’s perfect when we ob- 
tain the product as a result of simulation, or the transition is automated com- 
pletely. 

In this article we’ll investigate the model-to-product transition with respect 
to such an objects as embedded multiprocessor systems. The state-of-the-art 
technology of developing such a systems is characterized by the following. In the 
area of hardware, there exist mature technologies for automated transition from 
hardware description to its implementation. As a rule, this hardware description 
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is a result of simulation. But, the transition mentioned above is developed for 
the chip level only and implemented in the CAD systems based on the VHDL 
and Verilog languages. 

Now the support for such a transition starting from the systems level is 
’hottest’. There are many reasons for this, the main one is: the speed of hardware 
development now is far beyong the one of software development [7]. Authors 
do not know any design environment enabling the model-to-product transition 
starting from the systems level. 

In the area of software development, simulation is not used in practice. Var- 
ious specification methods does not give the opportunity to estimate the prop- 
erties of program under development with respect to the particular hardware 
environment. 

The manufacturability of simulation model development and investigation 
strongly depends on the concepts of simulation environment being used, on how 
this environment covers all development stages. Such an environment should 
include at least the following: a simulation modelling language, a programming 
language (if we wish to obtain a program as a product) or a hardware description 
language (if the product is a hardware component), a system behaviour spec- 
ification language. Aproppriate graphical facilities, editors, compilers etc. are 
required as for model as for program development. For the last 30 years, more 
than 200 languages and environments were proposed [8], with various concepts 
and capabilities. But, none of them was directed to investigation and develop- 
ment of multiprocessor distributed computer systems. 

These environments use different languages on different steps (e.g. for model 
description, for specification etc.). So the problem of syntactical and semantical 
consistency arizes immediately. 

All these environments has different architecture. The absence of stable and 
unified architecture (which is clear and convenient for user and provides integra- 
tion of all necessary tools) complicates the problem of portability and working 
with this environments in theclient-server network architecture. 

We’ll try to answer the questions mentioned above and show possible solu- 
tions on the case of the DYANA system applied to problems of development and 
analysis of operation of distributed multiprocessor computer systems. 

2 Project goals 

The DYANA system (DYnamic ANAlyzer) is the software system which is pro- 
posed to help analyze distributed computer environment operation. The design 
and development of the system were aimed at the following: 

— to develop the tool for describing as software behaviour as hardware be- 
haviour of distributed systems on the systems level; 

— to develop the tool for systems performance estimation under the different 
tradeoffs between hardware and software on the project system level stage; 

— to enable the application of algorithmic and quantitative methods of analysis 
to the same model description [1]; 
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— to have a possibility to vary the detail level of behaviour analysis depending 
on the detail devel of description; (this goal has a ’side effect’: to investigate 
the methodology of program development through simulation and stepwise 
refinement); 

— to experiment with a simulation models of software and hardware indepen- 
dently. 

The goals mentioned above imply the solution of the following problems: 

— how to describe such particularities of modeled object as indeterminism of 
program behavior, independence of program behavior from time, absence of 
unique time in a distributed system, shared resourses, existence of two types 
of parallelism - interleaving and real concurrency? 

~ how to measure the ’’computational work” of the program being analyzed 
and how to map the measure onto time for given hardware environment? 

— how to provide the technology for the development of a model to support 
the ’’top-down” approach, to enable re-usage of model components? 

— how to integrate all tools involved in product development? 

In other words, the main goal of the project is to develop an instrumental 
environment which enables the user to describe the target software and hard- 
ware on the systems level and analyze the behaviour of the target system as a 
whole. Also, such an environment will allow for software development through 
simulation. 

Let we can describe the software with variable degree of detail and analyze 
its behaviour. Essentially, this description is a model since we make it for the 
purpose of investigation and analysis. Gradually refining this description, we 
yield a program — that is, an algorithm description created for application, not 
analysis. This program has to have all properties checked during analysis with 
assurance. 

Generally, the idea of software design through simulation is not a new one. 
Examples are: an industry-level systems for design in the SDL language (SDT 
from Telelogic, [10]), systems supporting the OMT and ROOM methodologies 
[6], the Ptolemy simulation environment [11]. An interesting environment SimOS 
[9] permits to emulate the hardware and estimate its performance on the ’real- 
istic’ workload — up to industrial operating systems and applications. 

The main differences and advantages of our approach are as follows. 

At first, the developer is able to analyze namely dynamics (behaviour) of 
both the hardware and software. He is able to analyze the software behaviour 
with respect to the given target hardware environment. 

At second, it is possible to determine the program’s resource usage, e.g. 
execution time of a given code block for the target GPU architecture. Gertain 
powerful environments such as ObjecTime [13] focus on software development 
and code generation for target real-time OS. 

At third, within our approach it is possible to estimate and to verify both 
quantitative approach of program behaviour (e.g. performance indices) and log- 
ical (algorithic) properties without any rewriting of model description. 
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At fourth, the approach proposed enables the user to connect the statical 
program description (i.e. text) and its dynamics. Namely, DYANA lets to link 
the event of interest with correspondent code block. 

The theoretical issues of our approach along with description of first version 
of tools were given in [3] . 

The rest of this paper is organized in the following way. The next section 
briefly presents the computational model used in DYANA. The capabilities of 
software description and model detail up to executable program are shown in 
Sect. 4 by example. Sect. 5 describes the DYANA architecture. 

3 The Computational Model. Language Features 
Overview 

The DYANA model decription language named M^-SIM is based on the follow- 
ing model of computations. 



Processes and distributed programs. A program is a set of sequential pro- 
cesses communicating by means of the message passing. Every process has the 
set of input and output buffers. An attempt to read a message from an empty in- 
put buffer blocks the process until a message arrives. Messages are distinguished 
by types. In general, a message type is an equivalence class on the set of message 
data, but it can be detailed to a data value (as a single as a structured one). 

Research [4] has shown that this model of computations has certain noticeable 
properties, from the viewpoint of the algorithmic analysis. 

To capture the needs of the interprocess communication, two more features 
are added: the receive with timeout and the wait for a message arrival statements. 

To support modularity and stepwise refinement, a notion of distributed pro- 
gram (DP) is introduced. To form a DP, you need to declare the instances of 
processes and establish links between their input and output buffers. Since a DP 
may also have inputs and outputs, it’s possible to replace a process with a DP 
during the model refinement. 

As processes as DPs may be parameterized. During a DP construction, it’s 
possible to declare arrays of its subcomponents and to use C code blocks to 
manage the linking of buffers. (Note: the construction process is done prior to 
the model run, and the entire model structure remains unchanged during the 
run, leaving the possibility for algorithmic analysis, see Sect. 5.5). 

The machanism of the DP construction shown above enables to create reusable 
submodels. 

Executors. An important distinctive feature of the M^-SIM is the notion of an 
executor. An executor represents a hardware component of a system to be mod- 
eled and it maps the complexity of process’ internal actions onto modelling time. 
Please refer to Sect. 5.2 for details of mapping the computational complexity to 
time. The examples of executors application could be found in Sect. 4.2. 
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Binding. The process-to-executor binding description allows to describe dif- 
ferent kinds of parallelism. Processes bound to the same executor run in the 
interleaved mode, and those ones bound to different executors run really in par- 
allel. See examples in Sect. 4.2. 

4 An Example of Model Construction in DYANA 

The capabilities of model descriprion will be shown on en example of robotic 
control system for the manipulator (i.e. robot’s arm). 

The aim of manipulator’s work is to catch a moving oblect (a target). The 
manipulator consists of two chains and it has two degrees of freedom. To de- 
tect the target and to determine the target’s coordinates, a vision subsystem is 
provided, its particular principle of operation does not influence on this article’s 
subject and will not be considered. 

The idea of control algorithm is as follows. Having the target’s and manipula- 
tor’s coordinates a catch point is determined. Then, a trajectory of manipulator 
moving up to the catch point is computed. The next step is to move the ma- 
nipulator along the trajectory. If the trajectory is passed and the target is not 
caught, new catch point is computed, and so on. To follow the trajectory, the 
feedback-by-error algorithm is used, which is implemented on the control com- 
puter. 

4.1 Model Construction 

The following components operating in parallel could be distinguished in our 
system to be modeled: the vision subsystem, the control sybsystem and the 
manipulator itself. The general model structure is shown on Fig. 1. 




Fig. 1. Model structure 



Vision Subsystem. Let’s suppose that the target detection algorithm should 
take not more Td time to execute, and it runs periodically with pause time 
Tp. If a target is detected, the vision subsystem sends a message with target 
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coordinates (and velocity) to the control subsystem. Here is the model text for 
the vision subsystem: 

1 message Target {}; /*message to control subsystem */ 

2 process VisionO < 

3 output TargetData; /* output buffer to send a message */ 

4 > 

5 { 

6 msg TargetMark; /* message variable */ 

7 /* model parameters */ 

8 float Td = 1000; /*time for detection */ 

9 float Tp = 200; /* pause length */ 

10 while ( 1 ){ 

11 delayC Td ) ; /* simulate target detection */ 

12 if ( TargetDetectedO ) { /* target is detected */ 

13 TargetMark = message Target; 

14 send( TargetMark, TargetData ); /* send message to control subsystem */ 

15 } 

16 delay ( PauseTime ); /* do pause */ 

17 } 

18 } /* Vision */ 

Note that on the current level of detail the vision subsystem is treated just like 
the source of messages on targets (see the message type description in line 1, the 
message is sent in line 14) . The target detection algorithm itself is presented by 
the delay in line 11, which specifies the execution time for this algorithm. There 
is no computations there. The possibilities of model detail will be considered 
later, in Sect. 4.3. 



Control Subsystem. Let’s partition the control algorithm on high level and 
low level of control. Each level is presented by a separate process. The algo- 
rithm operates by the following way. When the high control level process receives 
thetarget coordinates, it requests the coordinates of manipulator and checks the 
possibility to catch the target. If catching is possible, the manipulator’s trajec- 
tory is computed and sent to the low control level process. These actions are 
repeated for the next position of target. Model text is as follows: 

1 process HiControlO < 

2 input TargetData(queue) , Feedback(queue) ; 

3 output Control, ManipAcq; 

4 > 

5 { 

6 msg in, out, x; 

7 int CatchPossible ; 

8 float ComputeCP = 100;/* time to compute catch point */ 

9 float ComputeTraj = 100; /* time to compute trajectory */ 
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11 

12 

13 

14 

15 

16 
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19 

20 
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while ( 1 ) { 

receiveC in, TargetData ); /*get target parameters */ 
out = message CoordReq; 

send( out, ManipAcq ); /*request for manipulator’s coordinates */ 
receiveC x. Feedback ); /* get manipulator’s coordinates */ 

/* test for possibility of catching and catch point computation */ 
delay ( ComputeCP ); 

if ( ! CatchPossible ) continue; /* impossible to catch */ 
delay ( ComputeTraj ); /* compute trajectory parameters */ 
out = message Traj ; /* send trajectory to low control level */ 
sendC out. Control ); 

} /* while */ 

} /* HiControl */ 



The low control level is responsible for following by computed trajectory, in 
presence of external physical infuences on the manipulator. Its implementation 
is omitted for brevity. 



Manipulator. In our model, the manipulator can perform two operations: to 
determine and send its coordinates; to do an (elementary) move by the control 
system’s command. The Manip process may receive messages of two types: move 
command (Move) and request for coordinates (CoordReq). Having received the 
former message, Manip makes a move and sends its coordinates to the low control 
level. Upon receiving the latter one, it replies with coordinates to the high control 
level. 

Note that the Manip process, essentially, isn’t a part of the computer system, 
it’s a component of an computer system’s outer environment. Under this term 
we mean the set of sensors, servomechanisms etc. interfacing a computer control 
system with controlled object. 

So, building a software model in the DYANA environment opens an opportu- 
nity to investigate the program behaviour together with its outer environment, 
which is crucial for the real-time programs development. 

4.2 Taking Hardware Into Account 

Besides the program description considered above, the complete model contains: 

— hardware description (a set of executors); 

— process to executor binding description. 

The sequential executor is an abstraction of a device performing only one action 
at a time. 

Let’s suppose that we use a dedicated signal processor for the vision subsys- 
tem and general-purpose Intel 80386-based computer — for the control subsys- 
tem. Then, the hardware description in M^-SIM may look like: 
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sequex DSP() {} /*for vision subsystem */ 
sequex CPUO { /* for control subsystem */ 

architec Intel386; 

} 

sequex ManipO {} /*for manipulator */ 

Leaving some syntactical details, one of possible binding descriptions may look 
like: 

bind Vision => DSP; 
bind HiControl => CPU; 
bind LowControl => CPU; 
bind Manip => Manip ; 

The HiControl and LowControl processes are bound to the same execu- 
tor. They will run in thr interleaved fashion. The actions of vision and Manip 
(these processes are bound to distinct executors) could be executed in parallel 
(if waiting for message does not instruct otherwise). 

Please note that the notion of executor could represent not only a CPU, but 
any other hardware component (e.g., a bus, a memory module, a switch etc). 
In this case, the sequential executor should be accomplished by the appropriate 
process containing the operation algorithm of the device to be modeled. 

4.3 Model Detail. Model-to-Program Transition 

In order to move from a model to a program, you should refine the data structure 
in messages and actions in processes. 

For example the structure of message Target from process Vision below and 
the delay statement in line 17 of process HiControl (catch point calculation) 
could be detailed by the following way: 

message Target {float Xt , Yt, Zt , Vx, Vy, Vz }; 
complex { 

Cl = cos(thetal); SI = sin(thetal) ; 

C2 = cos(theta2); S2 = sin(theta2) ; 

Xm = (11+12*C2)*C1; Ym = (11+12*C2) *S1 ; Zm = 12*S2; 

} 



The complete text of the catch point calculation is omitted due to the lack 
of space. 

Note the complex block above. If the architecture description of the 
target CPU is given in the model, the execution time of this block for given target 
CPU will be estimated during the model run. For details on time estimation, see 
Sect. 5.2. 

When the detail of M^-SIM program is finished, it may be converted into 
a C-l— I- program for the target computer. For example, let’s see the part of the 
conversion result for HiControl process below. 
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#include " mm.h" 

void mm_process_HiControl : ; mm_process_body () { 

mm_message *in = new mm_message ; 

int CatchPossible ; 

while(l){ 

mm_receive ( mm_bf _TargetData, in, 1 , "HiControl .mm: 12") ; 

(& mm_sample_message_CoordReq) -> mm_copy_message (out) ; 

mm_send ( mm_bf _MctnipAcq, out , 2 , HiControl . mm : 14" ) ; 

mm_delay(ComputeCP,3, "HiControl. mm: 17") ; 



} 

> 

Of course, the details of the target operating system interface should be taken 
into account. But it’s not a subject of this paper. Here we want only to show 
the possibility of such a conversion. 

An important note: the DYANA environment is capable to reproduce the 
parallel program behaviour with respect to computer system architecture of in- 
terest and particular outer environment on any stage of model detail. So, meeting 
the specified deadlines in a real-time control systems could be checked on any 
stage of detail. 

5 The DYANA Architecture 

The architecture of the DYANA system is shown on the Fig. 2. The most inter- 
esting components of the DYANA system are described below. 



5.1 The Runtime Subsystem 

The runtime subsystem is responsible for the following: 

— reproduction of the system’s behaviour on the base of process-oriented discrete- 
event simulation methodology (before execution, the program description in 
M^-SIM is translated to the text in the C-|— I- language, compiled and linked 
with the DYANA runtime library); 

— collection of the event trace for subsequent analysis. Also, the dynamic stage 
of the time estimation (see 5.2) is done by the runtime subsystem. 

Now, the design and development of distributed discrete-event simulation 
kernel for DYANA is in progress. Our approach to analysis and choozing the 
distributed model time synchronization algorithm is presented in [12]. 
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Fig. 2. Architecture of the DYANA system 



5.2 The Subsystem for Time Complexity Estimation 

The aim of this subsystem is to to estimate an execution time of a text block in 
the C language in the complex statements for given target CPU architecture. 
The underlying theory and architecture of this subsystem were described in [2,4]. 
Briefly, the main idea is as follows. The combined static- and- dynamic approach 
is used for the time estimation purposes. During a compilation, the static anal- 
ysis of the C code is being performed. For every linear code block in complex 
statement a prediction of execution time is being made. During a model run, 
when exact sequence of executed linear code blocks is known, the time estimate 
is being given on the base of static predictions. 

The mapping of computations to the target CPU architecture is implemented 
by the following way. A model of CPU architecture is being constructed which 
captures the essential features of a certain archtecture class, influencing on the 
execution time. For example, models of an von-Newmann sequential register- 
stack CPU and of a RISC CPU are supported now. 

For the register-stack CPU model, the algorithms of optimal code generation 
are implemented. The execution time estimate is based on the length of code 
generated [2] . During testing, the relative error of execution time prediction was 
in range of four to ten percents which is acceptable for practical use. 

The RISC CPU model enables you to determine statically the pipeline la- 
tencies due to data dependencies in instructions. Also, the instruction cache 
behaviour analysys could be done in the static phase. 

The architecture type of sequential executor can be specified by writing an 
identifier of the architecture, as follows: 



architec Intel286; 
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The architecture description itself (it can be rather awkward) is placed sep- 
arately and specifies the clock rate, register structure and instruction set of an 
executor. For RISC processors this description contains also the pipelines struc- 
ture, instruction processing scheme and cache configuration. 

Our time complexity estimation methodology was applied to the Motorola 
DSP96002 CPU. On the set of digital signal processing algorithms, a zero time 
prediction error was achieved, while the prediction time was 3 orders of magni- 
tude less than emulation time (see [14]). 



5.3 The Visualization Subsystem 

This subsystem is intended to view the event trace collected during model run. 
Events are associated with interprocess communication and with the beginning 
and finishing of process internal actions. The collected trace could be viewed 
in the form of timing diagram (See Fig. 3). User is able to scroll and scale the 
diagram, to select the model components of interest for visualization, to get 
an additional information about event and state attributes by clicking on event 
(state). Also, an important feature is the capability to observe the logical links 
between events and to locate the corresponding piece of process’s text while 
browsing events. 



5.4 The Performance Analysis Subsystem 

This subsystem is useful when you need certain integrated performance indices 
(such as working time, idle time, processor utilization, message queue length etc). 
These indices can be computed and displayed as tables, graphs and histograms. 
The output data representation could be easily imported into third-party tools 
for advanced processing and report generation. 



5.5 The Algorithmic Analysis Subsystem 

This subsystem allows the user to specify the behaviour of software under devel- 
opment and to verify the software behaviour against specification. Under term 
’behaviour’ we mean the partially ordered set of events (See [4] for details). 

For specification of properties of system being modeled, a special language 
was developed. The approach to specification and verification (with the previous 
version of this language) was described in [1]. This language named M^-SPEC 
permits: 

— to specify the actions of a process as relations between process states before 
and after an action; 

— to specify possible chains of actions using behavior expressions] 

— to specify the properties of a process and a whole system behavior as predi- 
cates on behavior expressions; 
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An important feature of M^-SPEC is: its syntax is close to one of M^-SIM , 
but the semantics of M^-SPEC is equivalent to one of the branching time logic. 

For the following two problems algorithms are developed and prototyped 
now: checking the consistency of specification itself; verifying the specification 
against the model description on the M^-SIM . More detailed description of the 
M^-SPEC could be found in [5]. 



5.6 The Integrated Development Environment 

Notes on Technology of Model Development. As we have noted above, 
one of the goals of M^-SIM development is to support the top-down design. That 
is, to let the user to start from the large-grain model picking up only general 
structure of system to be simulated and ignoring small-grain details. 

Gradually, step-by-step small details are refined, more and more detailed 
models are developed. 

Such an stepwise detail should be performed in three directions. 

1. Structure detail implies the detail of component’s internal structure. Such 
a feature is provided by independence of the distributed programs and the 
executors description on internal structure of subcomponents. Because of 
modularity, changes in any part of the model does not require changing 
(and even recompiling) of other parts. 

2. Action detail (i.e. move from simple prototyping of a process interface to 
real data processing). This kind of detail is provided by two ways to time 
complexity specification — the delay statement (it sets the model time 
delay explicitly but specify no computations) and the complex statement 
(it specifies computations, and model time delay is estimated by the special 
subsystem) . 

3. Message type and structure detail (i.e., for example, going from checking 
message type only to analysis of message contents) . To support such a detail, 
there exist two families of operators on msg-variables — the former use 
message type only, the latter group provide an access to message fields. 



Integrated Development Environment Features. For increasing of effi- 
ciency of model building a special object-oriented instrumental environment was 
developed. 

This object-oriented IDE relieves the user from necessity of working with files. 
All objects are stored in the repository (database). Every object has a visual 
representation on screen. All objects are arranged into the hierarchy. Models 
are at the top level of this hierarchy. By means of the Model List window, the 
user can operate on the model as a single object (e.g. compile, run it). Objects 
forming a model fall to one of the following groups: source descriptions, internal 
representations, results of model run. 

For every type of model component (process, executor, message etc.), the 
IDE handles (and gives the user to operate on) the components lists, see Fig. 4. 
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Fig. 4. IDE: the Model Components window 
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A model component could be viewed and edited in different forms by user 
wish. Now textual and structural (schematic) presentations are supported. 

The main advantages of environment described above are as follows: 

1. The usage of database provides the correspondence of external representation 
(e.g. screen images) of the set of descriptions with their internal organization. 

2. the semantical and time dependencies between source text components could 
be watched more accurately, which reduce the overheads during assembling 
compiled model. 

We should highlight an important feature of developed environment — the in- 
terface description of a component can be combined with more than one version 
of component implementation, the implementations may be either sequential or 
parallel. 

This feature lets: 

1. to perform the stepwise detail, with possibility to get back to earlier stages 
of development at any time; 

2. to experiment with different configurations of developed and debugged model 
(e.g. with different versions of components implementations), what is the 
final goal of the user of simulation system. 

6 Conclusion 

The DYANA environment described in this paper is directed to the following: 

— description of software and hardware (on the systems level) with variable 
degree of detail; 

— analysis of various aspects of computer system’s behaviour without hardware 
prototyping. 

The DYANA environment enables the user as to develop programs through 
simulation as to choose the proper hardware configuration. 

For our point of view, the most interesting features of the project are as 
follows: 

— the duality of analysis methods; 

— the time complexity estimation which helps to avoid the target architecture 
emulation. 

Now the prototype system is implemented in the Sun Solaris environment. The 
DYANA system was tested in the following areas: 

— performance analysis of local area networks; 

— software design and development for embedded systems. 
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The DYANA system is being used now in the EEC INCO-Copernicus Project 
DRTESY ^ which is aimed at evaluation (and mutual enhancement) of tools 
provided by project partners on a common realistic case study from the field 
of embedded avionics system design. Much attention will be paid to the time 
complexity reduction of our algorithmic analysis methods. 

The nearest goals of the future work are also: 

— to spread the database approach to trace storage and processing; 

— to develop the library of CPU models for those RISC processors which are 
used in embedded computer systems; 

— to build a library of reusable ’basic blocks’ suitable for modelling of networks 
and embedded hardware and software components. 
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Abstract 

While verification methods are becoming more frequently integrated 
into software development projects, software testing is still the main method 
used to search for programming errors. Software testing approaches fo- 
cus on methods for covering different execution paths of a program, e.g., 
covering all the statements, or covering all the possible tests. Such cover- 
age criteria are usually approximated using some add-hoc heuristics. We 
present a tool for testing execution paths in sequential and concurrent 
programs. The tool, path exploration tool (Pet:), visualizes concurrent 
code as flow graphs, and allows the user to interactively select an (inter- 
leaved) execution path. It then calculates and displays the condition to 
execute such a path, and allows the user to easily modify the selection 
in order to cover additional related paths. We describe the design and 
architecture of this tool and suggest various extensions. 



1 Introduction 

Software testing techniques [4] are frequently used for debugging programs. Un- 
like software verification techniques, software testing is usually less systematic 
and exhaustive. However, it is applicable even in cases where verification fails 
due to memory and time limitations. Many testing techniques are based on 
criteria for covering execution paths. Conditions are sought for executing the 
code from some point A to some point B, and the code is walked through or 
simulated. Different coverage criteria are given as a heuristic measure for the 
quality of testing. One criterion, for example, advocates trying to cover all the 
executable statements. Other criteria suggest covering all the logical tests, or all 
the flow of control from any setting of a variable to any of its possible uses [9] . 
Statistics about the effectiveness of different coverage approaches used are kept. 

W.R. Cleaveland (Ed.): TACAS/ETAPS’99, LNCS 1579, pp. 405-419, 1999. 
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In this paper, we present a new testing approach and a corresponding test- 
ing tool. The focus of the analysis is an execution path in a sequential code, 
or on interleaved execution paths consisting of sequences of transitions from 
different concurrent processes. The system facilitates selecting such paths and 
calculating the conditions under which they can be executed. It also assists in 
generating variants of this path, such as allowing different interleavings of the 
path transitions. 

The code of the checked programs is compiled into a collection of inter- 
connecting flow graphs. The system calculates the most general condition for 
executing the path and performs formula simplification. We present the tool’s 
architecture and demonstrate its use. 

The system’s architecture includes: 

• An SML program that takes processes, written using Pascal-like syntax, 
and produces their corresponding flow graphs. 

• A DOT program that is used to help obtain an optimal display of the flow 
graphs, representing the different processes. 

• A TCL/TK graphical interface that allows selecting and manipulating 
paths. 

• An SML program that calculates path conditions and simplifies them. 

• An HOT decision procedure that is used to further simplify the path con- 
ditions by applying a Presburger arithmetic decision procedure. 

2 System Architecture 

Research in formal methods focuses mainly on issues such as algorithms, logics, 
and proof systems. Such methods are often judged according to their expres- 
siveness and complexity. However, experience shows that the main obstacles 
in practically applying such technology into practice are more mundane: it is 
often the case that new proof techniques or decision procedures are rejected 
because the potential users are reluctant to learn some new syntax, or perform 
the required modeling process. 

One approach to avoiding the need for modeling systems starts at the no- 
tation side. It provides design tools that are based on a simple notation such 
as graphs, automata theory (e.g., [8]), or message sequence charts [1]. The 
system is then refined, starting with some simplistic basic design. Such tools 
usually provide several gadgets that allow the system designer to perform vari- 
ous automatic or human assisted checks. There is some support for checking or 
guaranteeing the correctness of some steps in the refinement of systems. Some 
design tools even support automatic code generation. This approach prevents 
the need for modeling, by starting the design with some abstract model, and 
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begin 

yl:=x; 

y2:=l; 

while yl<=100 or j2=/=l do 
begin 

if yl<=100 then 
begin 

yl:=yl+ll; 

y2:=y2+l 

end 

else 

begin 

yl:=yl-10; 

y2:=y2-l 

end 

end; 

z : =yl-10 
end. 



Figure 1: Floyd’s 101 program 



refining it into a full system. Using standard notation, such as message se- 
quence charts [3], conforms with the usual start of the design. On the other 
hand, automatic code generation is still add hoc, and it is not expected that the 
code generated would be efficient or elegant (although it is, by definition, well 
documented) . 

Our approach for testing is quite the complement to the above. After the 
software (or parts of it) is designed and coded, one checks its behavior under 
various conditions, or in various environments. One of the motivations of the 
Pet tool is to avoid the need for modeling, and allow the testing to be performed 
using a notation that is natural for the user. The tool automatically translates 
the code of the program to be checked into one of the earliest and most useful 
notations for software, namely that of flow graphs. The program is written 
as one or more processes in a syntax similar to Pascal. Figure 1 includes the 
code for Floyd’s 101 program, as accepted by our tool. We use the combination 
= / = as inequality. 

The graphical interface includes a window for each process, displaying the 
original text, and a compatible window displaying the corresponding flow graph. 
The flow graph is a directed graph, with some edges carrying labels. Each node 
in a graph is one of the following: begin, end, test, wait, assign. The begin 
and end nodes appear as ovals, the test and wait nodes appear as diamonds, 
labeled by the test condition, and the assign nodes appear as boxes labeled by 
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the assignment. There is no out edge from an end node, two out edges from a 
test node, and one out edge from all other nodes. The two out edges from a 
test node are labeled, one by “yes” and one by “no” . The flow graph that is 
generated for the program in Figure 1 appears in Figure 2. 

The focus objects of the tool are the exeeution paths. Path information 
is displayed using two additional windows. One window displays the recently 
selected execution path, and the other displays the most general eondition to 
execute the selected path. In order to maintain the connection between the code 
and the model (the flow graph), the different windows are context sensitive: 
pointing at a node (e.g., a test or an assignment box) in a flow graph window 
would highlight the corresponding code in the process source window.^. A 
selected path in the 101 program appears in Figure 3. Each transition appears 
within parentheses that correspond to its shape (and color) in the flow graph. 
Inside the parentheses there is a pair corresponding to the process name, and 
the number of the transition (as appears in the flow graph) . If several processes 
are involved, transitions of different processes appear with different amount of 
indentation from the left margin. If the coursor points at a transition listed in 
this window, the corresponding item in the flow graph and the corresponding 
text will be highlighted. 

2.1 Path Operations 

Software testing is based on inspecting paths. Therefore, it is of great im- 
portance to allow convenient selection of execution paths. Different coverage 
techniques suggest criteria for the appropriate coverage of a program. Our tool 
leaves the choice of paths to the user. (A future version, where various path 
selection criteria will be used to automatically suggest the tested paths, is un- 
der construction.) Once the source code is compiled into a flow graph, or a 
collection of flow graphs, the user can choose the test path by clicking on the 
appropriate constructs on the flow graphs. 

The selected path appears also in a separate window, where each line lists 
the selected node, the process and the shape (the lines are also indented ac- 
cording to the number of the process to which they belong). In order to make 
the connection between the code, the flow graph and the selected path clear, 
sensitive highlighting is used. For example, when the cursor points at some node 
in the path window, the corresponding node in the flow graph is highlighted, as 
is the corresponding text of the process. 

Once a path is fixed, the condition to execute it is calculated. The tool allows 
altering the path by removing nodes from the end, in reverse order, or appending 
to it new nodes. This allows, for example, the selection of an alternative choice 
for a condition, after the nodes that were chosen past that condition are removed. 

^Our choice was, in the case of a test, to highlight the entire minimal programming con- 
struct that is associated with it, such as an if-then-else statement or a while loop. 
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Figure 2: A flow graph for the program in Figure 1 
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(tests : 0) 

[tests : 1] 

[tests : 2] 

<testS : 8> 

<testS : 7> 

[tests : 5] 

[tests : S] 

<testS : 8> 

[tests : 9] 

(tests : 10) 

Figure 3: A selected path in the 101 program 

Another way to alter a path is to use the same transitions but allow a different 
interleaving of them. When dealing with concurrent programs, the way the 
execution of transitions from different nodes are interleaved is perhaps the most 
important source of problems. The Pet tool allows the user to flip the order of 
adjacent transitions on the path, if they belong to different processes. It is easy 
to check that, by repeatedly flipping the order in this way, one can obtain any 
possible execution of the selected transitions. 

2.2 Path Condition 

The most important information that is provided by Pet is the condition to 
execute a selected path. An important point to note is that an execution path 
in a set of flow graphs is really a sequences of edges, which when restricted to 
each of the processes involved, forms a contiguous sequence. However, when 
specifying an execution path, it seems most natural to give a selection of nodes 
to be executed. For most nodes, there is a one-to-one correspondence between 
the nodes in a flow graph and their out edges. The subtle case is when a test 
node is selected. Selecting such a node does not tell us how it executed, since the 
condition may be either true or false. The execution of a test is determined by 
whether its “yes” or “no” out edge was selected, which we can know by knowing 
the successor node to the test in the same process. Thus, if a test node is the 
last transition of some process in the selected path, it would not contribute to 
the path condition, as the information about how it is executed is not given. 

Let ^ = S 1 S 2 • • • Sn be a sequence of nodes. For each node s, on the path, we 
define: 

type{si) is the type of the transition in s,. This can be one of the following: 
begin, end, test, wait, assign. 

proc{si) is the process to which s, belongs. 
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cond(si) is the condition on s,, in case that s, is either a test or a wait node. 

branch(si) is the label on a node s, which is a test if it has a successor in the 
path that belongs to the same process, and is “undefined” otherwise. 

expr(si) is the expression assigned to some variable, in case that s, is an assign 
statement. 

var(si) is the variable assigned, in case s, is an assign statement. 

p[v / e] is the predicate p where all the (free) occurrences of the variable v are 
replaced by the expression e. 

The following is the algorithm used to calculate the path condition. Notice 
that it is calculated from the tail of the path to the head. 

current_pred := ‘true’; 
for i := n to 1 step -1 do 
begin case type{si) do 
test=^ 

case branch(si) do 
‘yes’=> 

current_pred := current_predAcond(sj) 

‘no’=> 

current_pred := current_pred A~<ccmd(si) 

‘undefined’=> 

current_pred := current_pred 
end case 
wait=> 

current-pred := current_predAcond(sj) 
assign=> 

current-pred := current.pred [ var(si) /expr(si) ] 
end case 

simplify (current_pred) 
end 



It is interesting to note that the meaning of the calculated path condition 
is different for sequential and concurrent programs. In a sequential program, 
consisting of one process, the condition expresses all the possible assignments 
that would ensure executing the selected path, starting from the first selected 
node. When concurrency is allowed, the condition expresses the assignments 
that would make the execution of the selected path possible. Thus, when con- 
currency is present, the path condition does not guarantee that the selected 
path is executed, as there might be alternatives paths with the same variable 
assignments. 
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Figure 4: Two simple concurrent processes 



In Figure 4, an example of two simple processes that share the variable a is 
given. The PASCAL code for the processes is as follows: 

Cl: 
begin 
a:=5 
end 

Consider the following path: 



(Cl : 


0) 


(C2 


: 0) 


[C2 


: 1] 


[Cl : 


1] 


<C2 


: 2> 


(C2 


: 3) 


(Cl : 


2) 



In this path, the ‘a := 5’ of the first assignment is executed after the ‘a := 2’ 
and hence the wait condition can be passed, and the path can be completed. 
This does not depend on the value of any variable. Thus, the path condition is 
‘true’. If we choose now to switch the third and the fourth lines, e.g., the two 
assignments to the variable a, the path cannot be passed, independent of any 
initial values of the variables. Thus, in this case the path condition is ‘false’. 



C2: 

begin 

a:=2 

wait a=5 
end 
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Switching the order between a pair of adjacent transitions is done by moving the 
mouse to the first transition in the pair and clicking a mouse button. The tool 
does not allow (the unreasonable choice of) permuting transitions that belong 
to the same process. 

2.3 Formula Simplification 

The primary information object that is provided by the Pet tool is that of a 
quantifier free first order formula, describing the condition under which a path is 
executed. In the prototype developed, we assume that the mathematical model 
is that of arithmetic over the integers. As shown in the previous subsection, these 
conditions are calculated symbolically, and can therefore be quite complicated to 
understand. In most cases, the automatically generated expression is equivalent 
to a much simpler expression. 

Simplifying expressions is a hard task. For one thing, it is not clear that there 
is a good measure in which one expression is simpler than the other. Another 
reason is that in general, deciding the satisfiability or the validity of first order 
formulas is undecidable. However, such limitations should not discard heuristic 
attempts to simplify formulas, and for some smaller classes of formulas such 
decision procedures do exist. 

The approach for simplifying first order formulas is to try first applying 
several simple term-rewriting rules in order to perform some common-sense 
and general purpose simplifications. In addition, it is checked whether the 
formula is of the special form of Presburger arithmetic, i.e., allowing addition, 
multiplication by a constant, and comparison. If this is the case, one can use 
some decision procedures to simplify the formula. 

The simplification that is performed includes the following rewriting: 

• Boolean simplification, e.g., ip A true is converted into p, and p A false is 
converted into false. 

• Eliminating constant comparison, e.g., replacing 1 > 2 by false. 

• Constant substitution. For example, in the formula (a; = 5) A p, every 
(free) occurrence of a; in is replaced by 5. 

• Arithmetic cancellation. For example, the expression (a; -I- 2) — 3 is simpli- 
fied into a; — 1, and a; * 0 is replaced by 0. However, notice that (x/2) *2 
is not simplified, as integer division is not the inverse of integer multipli- 
cation. 

In case the formula is in Presburger arithmetic, we can decide [7] if the 
formula p is unsatisfiable, i.e., is constantly false, or if it is valid, i.e., constantly 
true. The first case is done by deciding on -\3x\3x2 . . . 3xn p, and the second 
case is done by deciding on Va;iVa ;2 . . .'ixnP, where a;i . . .a;„ are the variables 
that appear in p. If the formula is not of Presburger arithmetic, one can still 
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try to decide whether each maximal Presburger subformula of it is equivalent 
to true or false. 

Another way of using the decision procedure for Presburger arithmetic is 
to check whether there are variables that are not needed in the formula, and 
can hence be discarded. For example, consider a Presburger arithmetic formula 
{p(x\,X 2 , ■ ■ ■ Xn). We can check whether the formula depends on the variable 
Xn by checking 

MX 1 MX 2 ■ . .'iXn-l^XriSXn ip{x\,X2, . . . , Xn) (p(xi,X2, ■ ■■ , Xn') 

Then, if this formula is true, we can replace Xn by 0 everywhere. 

2.4 Implementation 

The Pet system consists mainly of a graphical interface, and a program that 
is responsible for compilation and calculation, as described in Figure 5. The 
graphical interface is responsible for selection and update of execution paths. 
It was implemented in TCL/TK. Compilation and calculations are done via 
an SML program. The language SML was selected since it allows simple and 
efficient symbolic manipulations such as subformula substitution. 

The SML program is running as a server process. It receives requests for 
processing from the graphical interface. One such request is of the form 

file proeessname 

and results in the compilation of the process to a flow graph. Another type of 
request is of the form 

path proeessname:node . . . proeessname:node 

with a (reversed) selected path. The SML program calculates the weakest (most 
general) condition to execute the path, and returns it to the graphical interface 
for display. 

The SML program informs the graphical interface when compilation is done, 
and also prepares several files, which the graphical interface uses. These files 
are: 

• A DOT file, including the description of the flow graph that corresponds 
to the compiled process according to the DOT syntax (see Unix manual, 
or [6]. This allows using the DOT program in order to draw the graph. 

• An adjacency list, specifying for each node of the graph its immediate 
successor. This information allows the graphical interface to control path 
selection. 

• A list of pointers to the beginning and end of the text that corresponds to 
each graph item. This file allows connecting the flow graph with the text 
windows, so that the text corresponding to the currently selected node is 
highlighted. 
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Figure 5: General architechture of Pet system 



The graphical interface makes use of the DOT program to draw flow graphs. 
The SML code prepares a DOT file, which describes the nodes, arrows and text 
of the flow graph. The DOT program processes this file and produces a layout 
for a visual description of the graph. It produces another DOT file, where the 
graph objects are annotated with specific coordinates. The TCL/TK graphical 
interface reads the latter file and uses it to draw the graph. 

The SML program is compiled under the HOT environment. This allows 
using the Presburger Arithmetic decision procedure that is included in HOT to 
be used for simplifying arithmetic expressions by our system. 

3 Examples 

Consider the simple protocol in Figure 6, intended to obtain mutual exclusion. 
In this protocol, a process can enter the critical section if the value of a shared 
variable turn does not have the value of the other process. The code for the 
first process is as follows: 
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Figure 6: A mutual exclusion example 



begin 

while true do 
begin 

while turn=l do begin (* no-op *) end; 

(* critical section *) 
turn:=l 
end 
end. 

The second process is similar, with constant values 1 changed to 0. 

When we select the following path, which admits the second process mutexl, 
while the first process mutexO is busy waiting as follows: 



(mutexO : 


0) 


(mutexl 


: 0) 


<mutexl 


: 5> 


<mutex0 : 


5> 


<mutexl 


: 2> 


<mutex0 : 


2> 


[mutexl 


: 3] 


[mutexO : 


1] 



we get the path condition turn = 1, namely that the second process will get 
first into its critical section if initially the value of the variable turn is 1. When 
we check a path that gets immediately into both critical sections, namely: 



(mutexO : 0) 
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(mutexl 


: 0) 


<mutexl 


: 5> 


<mutexO : 


5> 


<mutexO : 


2> 


<mutexl 


: 2> 


[mutexO : 


3] 


[mutexl 


: 3] 



we get a path condition turn 7 ^ 1 A turn ^ 0. This condition suggests that 
we will not get a mutual exclusion if the initial value would be, say, 3. This 
indicates an error in the design of the protocol. The problem is that a process 
enters its critical section if turn is not set to the value of the other process. This 
can be fixed by allowing a process to enter the critical section if turn is set to 
its own value. 



4 Extensions and Future Work 

In this section, we describe work in progress, and planned extensions of the 
tool. The current implementation of the Pet tool provides a basic framework 
for testing sequential and interleaved execution paths. The implementation was 
designed to support adding many testing techniques and features. 

Software testing suggests various coverage criteria. For example, one might 
want to check paths that involve at least every executable statement, or paths 
from a statement where a variable is set to all or some of the statements where 
it is used [9] . Integrating such coverage techniques into our tool can be done by 
assisting the selection of a path according to such criteria. For example, when 
selecting an assignment node, the Pet tool can suggest all the possible nodes 
where the variable that is assigned is later used. These nodes are highlighted 
using a color different from the other nodes, and the user can select one of them. 
The Pet tool can then extend the current path with a shortest path from the 
current node to the node selected. Statistics about the quality of coverage can 
be collected. 

Another extension deals with testing of different interleavings that are formed 
from a given set of transitions. Interleaving concurrent transitions in different 
orders is a main pitfall in concurrent programming. Currently, support is given 
to interleave transitions of different processes in various ways by commuting 
between them. An extension is being developed in order to facilitate a more 
efficient and thorough inspection of different interleaved sequences. The main 
idea is that many permutations of concurrently executed transitions do not lead 
to different results. For example, consider two transitions that involve com- 
pletely different variables. Instead of only allowing the user to select particular 
adjacent transitions that will be commuted, the tool will successively generate 
different permutations of the selected interleaved sequences. 
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Using the compiled knowledge about the variables assigned and used by the 
transitions, a dependency relation between the transitions can be calculated [2]. 
The tool will calculate the next permutation that is not equivalent to the current 
one up to permuting adjacent independent transitions. (If it is equivalent, the 
path condition is guaranteed to be the same.) The new permutation will then 
be displayed and the new path condition will be calculated. Thus, the tool will 
help the user to cycle between different interleaved sequences that may give 
rise to different behaviors. (One can formalize this feature as presenting to the 
user different representatives for Mazurkiewicz traces [5].) This extension is 
also connected to the suggested path: if one uses the system’s recommendation 
about how to continue a path from each given state, one does not have to worry 
about how to interleave these paths, as a systematic and exhaustive search of the 
interleavings can be performed. Of course, one has to be careful, as permuting 
interleaved sequences can lead to exponential number of possibilities. 

Another direction of future development is to expand the programming lan- 
guage in which we require the programs to be written to include arrays and 
other data types, and to include subroutines. For arrays, the difficulty is calcu- 
lating the precondition when array subscripts are given by complex arithmetic 
expressions. 

We are also exploring different ways of presenting information to the user. 
Although the path conditions are in many cases simple to understand, there are 
cases where the user may find them difficult to use. Allowing the user to sup- 
ply various finite ranges for the program variables enables the system to check 
whether there are values in the given range that satisfy the path conditions. 

Finally, program slicing [10] can be used to extract projections of the pro- 
gram statements that affect a variable at a particular location. Such an analysis 
can also be calculated and displayed using our graphical interface. 
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Abstract. Techniques from different areas are combined to analyze par- 
allel and distributed software within a common framework. They include 
bisimulation equivalences known from process algebras, Kronecker rep- 
resentations of labelled (stochastic) transitions systems known from per- 
formance analysis using Markov chains, and ordered natural decision 
diagrams (ONDDs) as a generalization of ordered binary decision di- 
agrams famous in hardware verification as well as for the analysis of 
Boolean functions in general. The employed analysis tools are all part of 
a toolbox built on the abstract Petri net notation (APNN), a model in- 
terchange format using Petri nets. In this context we experience a cross 
fertilization of different research fields within a Petri net context. We 
exercise Lamport’s mutual exclusion algorithm to discuss the strengths 
and limitations of the presented approach. 

Keywords: Software analysis, Petri nets, Kronecker algebra, bisimula- 
tion, ordered natural decision diagrams, model checking 



1 Introduction 

Development of correct and efficient parallel and distributed software is by no 
means a trivial task. Many formalisms have been developed to obtain a clear dis- 
tinction between sequential and parallel elements of a task. A popular point of 
view is to consider such systems as a set of communicating sequential processes, 
e.g. in modeling formalisms like CCS [20], CSP [14], automata networks [1], and 
superposed generalised stochastic Petri nets [12], but also for programming in- 
terfaces like MPI, PVM, or distributed C (EPOCA). Communication between 
processes is either asynchronous (by message passing) or synchronous (rendez- 
vous), where the latter is more general, since every asynchronous communication 
operation can be easily described using synchronous communication primitives 
(see [20]), whereas the representation of synchronous communication using asyn- 
chronous primitives is much harder to realize. Thus synchronous communication 
is usually used in low level formalisms supporting functional analysis, whereas 
asynchronous communication is often part of high level paradigms for parallel 
or distributed programming. Since this paper focuses on model based analysis, 
we consider synchronous communication. 

Concurrent programs have a potential for speed up due to parallel execu- 
tion but carry the crux of potential deadlock and other unexpected, undesired 
behavior. Consequently functional analysis of parallel programs is an important 
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topic. A classical brute force approach is the enumeration of all possible cases, 
which results in a set of reachable states. Its general drawback is the state space 
explosion problem: even trivial examples can cause excessive dimensions in state 
spaces. Various techniques have been developed in different research areas to 
handle this problem at least for certain special, but still relevant cases and up 
to large dimensions of state spaces. Examples of such techniques are: ordered bi- 
nary decision diagrams (OBDDs) for verification, Kronecker representations for 
performance analysis, also useful for model checking, reduction of components 
according to equivalences in Process algebras to mention only a few examples. 
We consider combinations of such concepts and demonstrate how they fit to- 
gether for the analysis of concurrent software. We exercise Lamport’s mutual 
exclusion algorithm as a small example to indicate the strengths and limitations 
of our approach. The software tools we present belong to a much larger set of 
tools around a model interchange format named “abstract Petri net notation” 
(APNN) [4] and its toolbox [3]. 

The paper is organized as follows: Section 2 describes the combination of 
different methods to represent and analyze models of synchronously commu- 
nicating components. These methods include directed acyclic graphs for state 
space representation, Kronecker operations to describe transition relations and 
equivalences to reduce components before composition. Afterwards, in Sect. 3, 
we introduce a toolbox which supports modular analysis approaches. In Sect. 4, 
the modular analysis of a non-trivial example is presented. The paper ends with 
the conclusion including an outline of further work. 

2 Representation and Analysis of Synchronized Processes 

A set of processes can be but need not be sequential, hence we allow for atomic 
operations, sequences of such operations, and fork and join inside a process. 
Communication takes place via synchronous interaction. This scenario can be 
formalized in various manners, e.g. as process algebras like CCS [20] (if elemen- 
tary processes are sequential) or as Petri nets with superposition of transitions or 
as automata with synchronization. We initially consider Petri nets for modeling 
and visualization and mainly automata for subsequent analysis. 

The well known dining philosopher problem serves as a running example. A 
philosopher shows four activities: he or she thinks, gets two forks, eats, and puts 
two forks back. N philosophers sit around a table and share a total of N forks, 
i.e. philosopher i shares a fork to his left with i — 1 and to his right with i -|- 1 
- the index is applied modulo N to match the cyclic setting of philosophers. We 
model this by a set of N processes, where each process relates to one philosopher 
and the fork to his left. The t-th process is synchronized with the i + 1-th and 
i — 1-th via access of forks. This example can be nicely visualized and formalized 
as a Petri net. 

Definition 1. Place/transition net 

A P/T-net is a 5 tuple {P,T,I~ ,Mq) where P and T are non-empty, finite, 
and disjoint sets, P C\T = %, I~ , : P x T ^ IMo are the incidence functions 

and Mq : P — > INg is the initial marking, as a special case of a marking M : 
P^lNo. 
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Fig. 1. A dining philosopher with one of his forks 

An advantage of P/T-nets is their visual representation, where places are 
circles, transitions are boxes and incidence functions are directed, weighted arcs 
where arc weights of 1 are frequently omitted for readability. The initial marking 
is represented either as a number of dots (tokens) or numbers at the correspond- 
ing places. Let = {p G P\I~{p, t) > 0} denote the preset of a transition t, and 

= {p G P\I~^{p,t) > 0} denote the postset. The left part of Fig. 1 indicates 
a philosopher process i of our example with his fork to his right and the right 
part of Fig. 1 indicates the neighboring process z -I- 1. The dynamic behavior of 
a P/T-net results from the enabling and firing of its transitions: a transition t is 
enabled at a marking M if M{p) > I~{p, t) for all p G e.g. the initial marking 
in Fig. 1 enables transitions think J, and think J, + 1. An enabled transition t at 
marking M fires and yields successor marking M'{p) = M{p)+I'^{p^ t) — I~{p, t). 
Starting from the initial marking Mq, successive application of the firing rule for 
all enabled transitions yields the set of reachable markings and a reachability 
graph. The latter is a directed graph, where nodes are given by the set of reach- 
able markings and arcs result from the firing of transitions. If arcs are labeled 
with the corresponding transition identifier, we obtain a labeled state transition 
system. The set of reachable markings of an isolated P/T-net of process J is given 
in the table to the right of Fig. 2, index z is omitted for readability and only 
places with a marking greater than 0 are denoted. 

We consider P/T-nets with synchronization: two P/T-nets are synchronized 
by fusion of those transitions which are selected for synchronization. This al- 
lows to describe processes with synchronization of rendez-vous type in a natural 
manner, e.g. Fig. 1 shows two philosopher z and z -I- 1 which are synchronized by 
merging transition getJ, and putJ,. 

A general concept of synchronization uses labels for transitions, i.e., each 
transition is labeled with some label from a finite set of labels. In a composition 
identically labeled transitions are fused. A similar form of synchronization is used 
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Fig. 2 . Reachability graph of philosopher with one fork and its reachability set 

in process algebras [20] and different classes of Petri-Nets [5,6]. If the reachability 
set is finite, then the reachability graph of a P/T-net can be interpreted as a finite 
automaton. Transition labels in the automaton result from the corresponding 
transition labels in the P/T-net. We will not further consider P/T-nets, because 
all analysis steps we present in the sequel make use of the description of a 
system as a set of synchronized automata. Whether the automata result from 
the reachability graph of a Petri-net or the derivation graph of a process algebra 
specification is not relevant for the analysis. We have used a Petri-net formalism 
for specification since it has a nice visual representation. However, for other 
purposes process algebras might be more adequate to specify components. 

Definition 2. An automaton is a 4 tuple A = (S', S, sq, L) where S = {0, 1, . . . , 
n— 1}^ is the set of states with eardinality n and initial state sqGS.SCSxSxL 
is the state transition relation for a finite set of labels L, where a state transition 
from a state Sx to a state Sy earries a label I G L, such that (sx, Sy, 1) G S. 

We consider non-deterministic automata, such that <5 is a relation and not neces- 
sarily a function. An automaton can be represented as an node and arc labeled 
graph, e.g. by the reachability graph of Petri net, or represented as a sum of 
boolean adjacency matrices Qi where Qi{x,y) = 1 if {sx,Sy,l) G S and 0 
otherwise. The reachability graph of a dining philosopher is shown on the left 
side of Fig. 2 and can be represented by 6 matrices including all together 9 non- 
zero elements. It is often not necessary to distinguish all labels at the automata 
level. Thus we adopt the hiding mechanism of process algebras and use the con- 
vention that transitions which need not be distinguished result in unlabeled arcs 
in the automaton. If we consider in the example only those transitions that are 
required for synchronization, then we obtain the graph shown at the right side 
of Fig. 2. This graph can be represented by 5 matrices with 9 non-zero elements. 

For synchronization between a set of N automata A\, A 2 , ■ ■ ■ , Am we use 
the index to characterize the different automata and define the synchronized 
automaton A = A 1 IA 2 I . . . \Am = (x A ^5^, 6, (si, S 2 , . . . , Sn), uA and the set 
of synchronization labels LS = {I G L\3i j : I G Li A I G Lj}. The number of 
states in Si is denoted as n^, which implies that x A includes O^i states. 

^ The state space of an automaton is isomorphic to a finite set of integers, depending 
on the context we use the notation x and Sx for the a;-th state in the set. 
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Synchronization is of the rendez-vous type and refers to equal labels in Ai and 
Aj, i.e. neither Ai nor Aj is able to perform I G LS independently of the other 
after synchronization. 

If one represents 6 as the synchronization of automata yields a 

matrix description of 6 in the dimension of the cross-product of automata. S has 
a space efficient compositional representation as a sum of Kronecker products, 
see [11,22] . 

Definition 3. Kronecker product, Kronecker sum 

Let he square matrices of dimension (/c* x A:*) then their Kro- 
necker product Q = {^^1 Q* is defined by Q(x,y) = Ofci where x = 

Yln=i and y = Y.f=i V^9i with weights gi = l,g^ = k^-'^g^-i (for i > 1). The 
Kronecker sum B — Q* is then given by Q* = ® ® Ir* 

where Iii,Ir‘ are identity matrices of dimension T x T,resp. r* x r* where K = 
rij=i = n^i+i and I{a,b) = 1 iff a = b and 0 otherwise. 

Kronecker operations do not only apply for real valued matrices but also 
for Boolean matrices if addition is defined as Boolean or and multiplication as 
Boolean and. We focus on Boolean matrices. A Kronecker product formalizes 
the operation of multiplying every matrix element of one matrix with all matrix 
elements of the other matrices; these products of matrix elements are arranged 
in lexicographical order in the resulting matrix, for more details see, e.g., [11]. 
The key observation is that the fact “synchronization successfully takes place 
if all processes agree (join) to the rendez-vous” can be formally expressed by 
the nonzero result of a product, i.e., if all terms are nonzero (= all processes 
agree), the synchronization can take place. If an automaton Ai is not involved 
in a synchronization for label I, i.e. if I ^ Li, then we define Q\ = /*, where 
the identity matrix is a matrix of dimension Ui x n^. For I £ Li, Q\ \s a, 
Hi X ni matrix representing all ^-labeled transitions in automaton i. Hence one can 
represent 5 by X[/6L Ql- This is an extremely space efficient representation 

of S since for a Hi matrix we use only |L| times N matrices of 

dimension Ui x Ui (where in practice a lot of matrices will be identity matrices 
I, that need not be stored at all) . 

Let R denote the set of reachable states which results from the reflexive, 
transitive closure of 6 including the initial state sq. Obviously the set of reachable 
states i? of A = Ai I . . . \An is a subset of >^(LiSi due to synchronization. The set 
x(L^Si can be represented as a tree structure of N levels, where nodes at a level 
i have Ui sons such that a path in this tree corresponds to a state (si, . . . , Sn). 
If one extracts all paths from this structure which refer to unreachable states, 
one trivially obtains a representation of R, where nodes at a level i provide the 
reachable fraction of Si. Let R(si, . . . ,Si) = {x G = si A . . . A = Si} 
denote the subset of states in R which match the given states of automata 
I, . . . ,i. Then R{si, . . . ,Si) refers to a subtree with a root node at level i-\-l. Let 
the root node of a subtree be denoted as Ri{s\, . . . , Si-i), which is the subset of 
reachable states in Si for the context of states si, . . . , Si_i. This notion reflects 
a sort of conditional reachability set. Consequently i?() = R refers to the whole 
tree and R\{) refers to the root node. 
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Fig. 3. Tree a) and 2 DAG representations b),c) of the reachability set for N=3 

Fig. 3 a) shows a tree representation of the philosopher model with 3 pro- 
cesses. This model contains \R\ = 32 reachable states of 6^ = 216 potential 
states. The tree consists of 23 nodes and 22 arcs. It contains 32 paths to en- 
code the triples which describe a state within Employing binary search 

at each node, one needs O(logrii) to search a certain path (state). Nodes at 
a level i belong to a single automaton Ai, but these nodes can vary in their 
cardinality according to the conditional reachability. Several nodes and subtrees 
are equal, which indicates redundancy. As for ordered binary decision diagrams 
(OBDDs) isomorphic subtrees do not require more than a single representa- 
tion. Two trees R{si, . . . ,Si) and R{s'i, . . . , s') are equal iff i?i(si, . . . , Si_i) = 
Ri{s'i, . . . , s(_]^) and all pairs of subtrees i?(si, . . . , Si, Sj+i), R{s{, . . . , s(, s'_|_j^) 
with Si+i = s'_|_]^ are equal. Clearly at the bottom level, this simplifies to equal- 
ity of sets Rn{si, . . . , sn-i) = Rn{si, . . . , sn-i)- If we apply a folding operation 
similar to OBDDs in a bottom up manner we obtain a unique directed acyclic 
graph (DAG), whose set of paths is equal to the set of paths in the tree. A DAG 
representation of R save space, but the effort to find a state/path remains the 
same as in the tree representation. The corresponding DAG for the tree in Fig. 3 
b) uses only 7 nodes, 14 arcs to represent 32 paths (triples, states). 

For some analysis algorithms, e.g. in performance analysis based on CTMC 
analysis [16,9], it is important to be able to assign specific information to a 
single state during analysis. For such algorithms a unique, bijective mapping 
TO : R — > {0,l,...,|i?| — 1} has to be known. If one applies a lexicographical 
order on states in R, m simply assigns an index according to this total order 
on the elements of R. The mapping for lexicographical order can be integrated 
into the DAG structure if one recognizes that one basically has to count the 
number of leaves to the left of the path of a state s in the tree. The cardinality 
of leaves is obviously equal among isomorphic subtrees such that it remains 
invariant under the folding operation, e.g. as indicated by arc labels in Fig. 3 
b) for such cardinalities. Gonsequently, by assigning corresponding weights on 
arcs of the DAG, one is able to evaluate to for a path si, . . . ,sn in the DAG by 
summation of arc weights at each node which leave from the left positions of Si 
in Ri{si, . . . , Si_i) plus the position of the sn in i?Ar(si, . . . , sn-i). Glearly an 
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implementation will precompute such weights to avoid the local summation at a 
node Si- Fig. 3 c) provides these values as arc labels for our example. Note that 
the position of state (3,2,5) results from the summation of arc labels 20 and 2 
and the position of S 3 = 5 inside the leave node, which is 1, such that we obtain 
to( 2, 3, 5) = 23, which is correct since m(0, 0, 0) = 0. 

So far we considered ways for space efficient representations of reachability 
graphs by Kronecker algebra and representations of reachability sets by DAGs. 
With these representations it is possible to perform analysis of models with very 
large state spaces. From the DAG representation of R, reachability of a specific 
state can be decided in 0(^ logrij). From the Kronecker representation of <5, all 
successors of a state can be computed in a time proportional to the number of 
successor states. Within reachability analysis, the Kronecker representation can 
be further exploited using two observations: 1) transitions with labels I ^ LS 
can occur locally and independently in the components. Thus it is possible to 
define some canonical ordering among those transitions instead of considering 
all interleavings when computing successor states of a state during reachability 
analysis. 2) the state ordering defined due to the Kronecker representation im- 
plies a perfect hash function such that a bit vector of length ni=i is sufficient 
to decide in 0(1) whether a state has been reached or not. For details about the 
algorithm and its performance see [17]. 

A well known, but orthogonal approach to reduce complexity is state aggre- 
gation based on equivalence relations. The goal is to reduce the number of states 
and transitions but to retain the possibility to compute the required results. A 
successful! application requires an equivalence relation has to exist which pre- 
serves the required results and an aggregation algorithm which performs faster 
than the analysis of the original system. The latter usually requires a congru- 
ence relation with respect to composition via synchronous transitions, such that 
aggregation can be applied for each subsystem at the automata level and the 
combination of aggregated subsystems gives an aggregated but still equivalent 
overall system. In this case, the Kronecker representation is convenient to com- 
bine aggregation and composition, because aggregates are computed at the level 
of automata matrices yielding a matrix description of the aggregated automaton 
which can be used instead of the original matrices in the Kronecker representa- 
tions. We briefly outline the steps of equivalence computation in the context of 
Kronecker based analysis. 

A large number of equivalence relations has been proposed in the literature 
(see e.g., [24] for equivalences in the context of Petri nets), especially equivalences 
of the bisimulation type are popular. An equivalence relation TZi C Si x Si is a 
bisimulation if for all {sx, Sy) € TZi and all I € Lf. 

1. Q]{x,z) = 1 implies Q\{y,z') = 1 for some z' with (sz,Sz/) G TZi and 

2. Q\{y,z) = 1 implies Q\{x,z') = 1 for some z' with (sz,Sz<) G TZi. 

The bisimulation relation with the least number of equivalence classes can 
be computed for finite state systems as the fixed point of the following partition 
refinement 

= {(sa:,Sy)l(sa;,Sy) G and 

Q\{x,z) = 1 implies 3z' with (sz,Sz') G TZ^ such that Q]{y,z') = 1 and 

Q\ly,z) = 1 implies 3z' with (sz,Sz') G TZi that Qi(x,z') = 1 }, 
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where TZ^ is some initial relation. For a bisimulation, initial relation TZ^ = SiX Si 
is sufficient. Efficient algorithms to compute the fixed point are known [21,15] 
and implemented in analysis tools [10]. According to a bisimulation relation TZi, 
an aggregated automaton can be built by substituting every equivalence class 
by a single state. Let hi be the number of equivalence classes and TZi [sx] be the 
x-equivalence class. Then the aggregated automata is defined by hi x hi matrices 
Q\ with Q\{x,y) = 1 if Sa, G TZi[sx] and Sy G TZi[sy] with Q\{x,y) = 1 exist. All 
remaining elements are 0. Matrices Q\ can then be used instead of Q\ in the 
Kronecker representation reducing the state space of the composed model by a 
factor of hi/rii. Since bisimulation is a congruence, the composed models using 
the original and aggregated model are bisimulation equivalent. 

Bisimulation relations can also be defined for modified forms of automata. A 
popular modification yielding a less discriminating equivalence results in hiding 
of some labels. We define a label I to be local for automaton i if I G Li and I ^ LS. 
Let be the set of local labels for automaton i. Observe that Qj = P for 
I G and i yf j. Let Q}- is easy to show that in the Kronecker 

representation of 5 the terms involving Q\ {I G L'f‘^) can be substituted by a 
single term involving and matrices = P for j yf z. In contrast to the 
original system, different labels from can no longer be distinguished. Define 
iQli)* = Y^V=oiQ\i)^- ^ transitive and refiexive closure algorithm computes 
{Q\i)* within 0(np). A bisimulation equivalence for the automaton results from 
(QD* = (Qii)*Qi(Qu)* for all I G Li \ This equivalence relation hides the 
details of transitions from L\°‘^ and still preserves the external behavior visible 
via labels I G L^\ L\^^. Corresponding bisimulations are usually denoted as weak 
bisimulations. Labels which should not be observed in functional analysis and 
which are not required for synchronization can be collected in and hidden 
in the aggregated automaton. 

So far, bisimulation equivalences do not preserve the reachability of states, 
i.e., if in the aggregated automaton state Sx is reachable, at least one state 
Sy G TZi[sx\ is reachable, but it is not clear whether all states or more than 
one state are reachable in the original automaton. The reason is that the equiv- 
alence relations consider only the future and not the past behavior. If results 
with respect to specific states should be computed, the equivalences have to be 
extended. To formalize the approach, states are characterized by labels and an 
equivalence relation is computed by refining TZ^ where (sx,Sy) G TZ'f if Sx and 
Sy are identically labeled (see [10]). This approach works if the number of state 
labels is relatively small. However, in cases with a large number of state labels, 
the resulting number of equivalence classes will also be large. In particular, if 
the set or reachable states should be computed, the aggregation presented so far 
does not help because in the initial relation TZ^ each equivalence class contains 
a single state. 

For reachability analysis we have to consider the past instead of the future 
behavior, because reachability of a state implies the existence of a path from the 
initial state. An equivalence relation preserving reachability has been introduced 
in [7]. Let TZi C Si x Si be an equivalence relation such that (sq, Sa,) G TZi implies 
(Q)j)*(0, x) = 1 (i.e, all states which are in the same equivalence class as the 
initial state are reachable by internal transitions from the initial state) and for 
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Fig. 4. Reachability graph of an aggregated philosopher a) and DAG represen- 
tation of the aggregated reachability set b) . 

all ^ and all I G L^'. 

1. {Q\)*{z,x) = 1 implies {Q\Y{z' ,y) = 1 for some z' with {sz^Sz') S TZi and 

2. {QY*{z,y) = 1 implies {Ql)*{z',x) = 1 for some z' with {sz,Sz') & TZi. 

The relation can be denoted as a weak inverse bisimulation since it considers 
incoming instead of outgoing transitions. The largest weak inverse bisimulation 
can be computed using a partition refinement algorithm with the transposed 
instead of the original matrices. As shown in [7] (sx,Sy) € TZi for some weak 
inverse bisimulation TZi implies that if automaton i can be in state Sx after 
a sequence of synchronized transitions, then it can as well be in state Sy and 
vice versa. This behavior is exploited for an efficient reachability analysis. In a 
first step, aggregated automata with respect to weak inverse bisimulation are 
computed, then reachability analysis for the complete system is performed using 
the aggregated instead of the original automata. If in the aggregated system state 
(si, . . . , Sat) is reachable, then all states from x7?.2[s2] x . . . x7^Ar[sAr]} are 

reachable in the original system. If (si, . . . , sat} is not reachable, then all states 
from {7^1 [si] x 7^2 [S 2 ] x . . . x 7^Af[sAf]} are unreachable too. Exploitation of this 
results allows a very time efficient generation and space efficient representation 
of huge reachability sets, for details see [8]. 

If we compute equivalence relations for the philosophers in the example and 
declare all transitions which are not needed for synchronization as local, then the 
relation TZ with equivalence classes 7?.[6] = (0,2), 7^[1] = (1,3) and 7^[2] = (4,5) 
is a weak bisimulation in both directions, forward and backward. Thus the ag- 
gregated automaton (Fig 4a) with 3 states and 4 transitions can be used instead 
of the original one in subsequent composition and analysis. Because 7^ is a weak 
forward bisimulation, model checking using the aggregated instead of the origi- 
nal system yields identical results and since 7^ is a weak backward bisimulation, 
reachability analysis can be performed using the aggregated system. Thus, it is 
indeed possible to perform all analysis steps on the Kronecker and DAG repre- 
sentation of the aggregated system. Fig. 4b) shows the DAG of the aggregated 
reachability set which is more compact then the original DAG but contains the 
same information, e.g., reachability of aggregated state (0,2, 1) implies that all 
8 states from {(0,2) x (4,5) x (1,3)} are reachable and also the other way, 
since state (0, 1, 2) is not reachable in the aggregated system, all 8 states from 
1(0,2) X (1,3) X (4,5)1 are unreachable in the original system. 




Modular State Level Analysis of Distributed Systems Techniques 429 




Fig. 5. Structural overview of the toolbox 

3 Tool support 

Many software tools for analysis of finite state systems exist. They differ in mod- 
eling paradigms, analysis techniques and the kind of results they compute. Often 
they are standalone developments such that models cannot be interchanged. This 
implies that a fair comparison of different techniques is often hard, the combi- 
nation of techniques from different tools is impossible and it also results in a 
lot of redundancy, because basic modules like state-space generators, graphical 
interfaces etc. are often reimplemented for a specific tool. Recent efforts aim for 
standardized interfaces to support the interchange of models and algorithms, 
e.g. the ISO standardization approach for Petri nets [13], the Petri net kernel of 
the Humboldt university [18] and the Electronic Tool Integration platform [25]. 
The presented analysis techniques are integrated in a toolbox, the APNN tool- 
box [3], which was developed with similar ideas in mind. The toolbox is based 
on two standardized file formats. First, the so-called abstract Petri net notation 
(APNN), an extendable file format (formal grammar) for a rather general class 
of Petri nets, including colored, hierarchical and stochastic nets [4] . The second 
format is for synchronized (stochastic) automata which matches the formalism 
introduced in Section 2. Figure 5 gives an overview of the APNN toolbox. 

Due to lack of space, we name only those parts which are relevant for the 
functional analysis of systems. We neglect components that deal with quan- 
titative system analysis using techniques for Markov chain analysis or discrete 
event simulation. Currently two graphical interfaces are available to specify Petri 
Net models structured in synchronized components. Both interfaces generate an 
APNN decription of the model. The APNN description of a model is the input 
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for analysis modules at the net level. Invariant analysis is often useful to obtain 
first results - especially upper bounds for token populations on the places of a 
Petri net, which are in turn helpful to limit the size of automata state spaces in 
composed nets [16]. 

The APNN description is read by the module for state space generation. It is 
possible to generate the state space and transition system for the complete net 
or for components only. The latter results in an automata network description 
of the model which contains all information necessary to generate the Kronecker 
representation of the complete system. Reachability graphs are stored as sparse 
matrices, one for each transition label. Automata description can also be ob- 
tained from other modeling formalism, e.g. from a process algebra specification 
consisting of the parallel composition of components at the highest level, the 
description as an automata network can be easily generated by computing the 
transition systems of the components. 



N 
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\R\ 


DAG nodes 


DAG Size in KByte 


non-zeros 


Time in sec. 
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47 
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4.8 
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Table 1. Sizes and efforts to generate and represent the philosophers example. 

The automata description is the interface for different state based analysis 
modules, e.g. a module for equivalence computation and the generation of re- 
duced automata, a module to generate the reachability set using DAGs, and a 
module to perform model checking for computational tree logic (CTL) formulas 
using DAGs and Kronecker representations. 

To present first results, we consider the analysis of the philosophers example 
using the modules of the toolbox. Tab. 1 includes results for configurations with 
up to 20 philosophers. The first column contains the number of philosophers, |i?| 
and |i?| are the number of markings in the reachability set and the aggregated 
reachability set, respectively. The size of the reachability set grows very rapidly 
with an increasing number of philosophers, whereas the aggregated reachability 
set is relatively small even for a larger number of philosophers. The following 
two columns include information about the DAG to represent |i?|. As shown 
above, knowledge of R and the equivalence classes allows us to characterize R 
completely. For all configurations of the example, the number of nodes in the 
DAG is very small compared to the size i?, let alone compared to the size of R. 
Memory requirements to store the DAG are shown in the fifth column. Apart 
from the DAG, the equivalence classes and the matrices have to be stored to 
represent the reachability set and graph. The number of non-zero elements in all 
matrices which are needed to represent the reachability graph is shown in column 
six. The last column includes the total time required to generate the compact 
representations of reachability set and graph starting with the APNN-description 
of the model. The time is measured as “wallclock time in seconds” on a Sun 
UltraSparc workstation with 167MHz GPU and 128 MByte of main memory. 
Since the different analysis steps are performed by single modules communicating 
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via a file interface, time includes the effort to load programs and read and write 
files. However, even for the largest configuration with more than 15 billions 
reachable markings, the compact representation is generated in about half a 
minute and requires less than 10 KByte memory. The compact representation 
can afterwards be used in further analysis steps including model-checking or 
performance analysis. 

Observe that we did not exploit symmetries in the model or identities of dif- 
ferent components. This would be an additional step to improve analysis. How- 
ever, we obtained similar result for non-symmetric configurations where some 
of the philosophers pick up forks one after the other. With the approach even 
configurations with more than 20 philosophers can be handled. In this case it is 
preferable to group two or three philosophers to a single automaton which can 
be aggregated to a small automaton. The philosophers example includes some 
features, which are common in parallel or distributed systems and support our 
analysis approach. Components have some internal transitions such that corre- 
sponding automata can be substantially aggregated. Additionally, synchroniza- 
tion takes place between adjacent components and not globally. The following 
example is less favorable and demonstrates limitations of our approach. 

4 Example 

Lamport’s mutual exclusion algorithm [19] for shared-memory systems without 
test-and-set instructions, but with atomic read and write operations is, of course, 
an academic example, but it is complex and not easy to analyze. We analyze the 
algorithm for a system consisting of N processes cycling between local computing 
and access to shared memory. Figure 6 gives the pseudo code for process i. The 
basic idea is that in systems where contention to a shared resource is rare, it 
is not efficient to inspect the state of all other processes before accessing the 
shared resource. By a sophisticated use of variables x and y it is possible to 
assure exclusive access without first scanning all other processes. However, it is 
not straightforward to describe the meaning of x and y. It is worth mentioning 
that the algorithm in its original setting is not symmetric for processes. Since in 
the for-loop processes are scanned starting with process 1 and ending with N in 
the code of each process, processes are treated in a different way depending on 
their number. The difference in process behavior destroys symmetries such that 
methods reducing state spaces due to symmetries cannot be used for the example. 
Lamport’s algorithms is considered in [2], where colored stochastic Petri nets are 
applied for its analysis and in [23] , where it is modeled as a network of stochastic 
automata. Our model is in some sense in between these approaches since we use 
superposed Petri nets which are mapped on automata. 

The algorithm is too complex to be described as a ffat P/T net. We used a 
colored net with hierarchies as in [2]. The major difference is that we explicitly 
model the for loop. In contrast to [2] our net describes system with N components 
interacting via synchronized transitions. The net for component i contains the 
description of process i plus places for the variables b[i], x = i with complement 
place X i, y = i with complement place y ^ i. The first automaton considers 
additionally the situation y = 0. Values of the variables are described in a 
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16. 
17. 



compute locally • 

start: (&[i] := true) ; /* start access to shared memory */ 

if then 
loM := false) ; 
await {y ~ 0) ; 
goto start h ; 

if (x$i] then 
(6[4 := false) : 

for j := 1 to A: do await {not b[j]) od ; 
if {y 7 ^ i) then 
await {y — 0) ; 
goto start fi fi ; 
critical section ; 

^[i\ := false) ; /* end access to shared memory */ 



Fig. 6. Lamport’s algorithm code for process i. 

distributed way. Such a description is not obvious, however, the alternative, 
where components for the variables x and y are introduced, as in [23], results 
in a model where the state of the automata for the variables determine the 
state of the remaining automata and the modular analysis behaves similar to a 
conventional analysis of the complete net. 

For a model with N processes, the state space of the first automaton contains 
40 + 4N states, the remaining state spaces for the automata 2, ... N include 
20 + 2N states each. The size of the automata state spaces depends on the 
number of processes due to the for-loop, where each other process is considered. 
Aggregation with respect to weak backwards bisimulation reduces the state space 
of the first automaton to 32 + 4A^ states and the remaining state spaces to 
16 + 2N . For this example we obtain only a small reduction by a constant value 
with the aggregation approach. This shows the complexity of the processes. 
Nevertheless, the aggregated reachability set is significantly smaller than the 
original one, although the sizes of both reachability sets grow rapidly with an 
increasing number of processes. Results for the example with 3, 4 and 5 processes 
are shown in Tab. 2. The effort to generate reachability sets for this example is 
significantly higher than for the philosophers example. However, for the largest 
configuration {N = 5) with nearly 8 millions of states, it takes less than an hour 
to generate the reachability set and represent it in a very compact form requiring 
about 26 KByte of memory. For iV = 6, \R\ > 1.1 • 10® states, but could still be 
represented in a compact form. Nevertheless generation times become extremely 
large. 



N 


|R| 


|R| 


DAG nodes 


DAG Size in KByte 


non-zeros 


Time in sec. 


3 


16,683 


11,337 


40 


4.7 


11,568 


18 


4 


347,875 


222,046 


84 


12.9 


16,042 


129 


5 


7,856,309 


4,760,531 


145 


25.8 


21,056 


2,948 



Table 2. Sizes and efforts to generate and represent the Lamport’s algorithm. 

The example shows also the limits of the approach, if processes are highly 
synchronized such that abstraction cannot be applied to reduce intermediate 
reachability sets. However, even for this example the approach outperforms con- 
ventional state space generation which fails completely for the model with N = 5 
on the same hardware due to memory limitations. 
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5 Conclusion 

We propose a methodology to analyze distributed software systems at the state 
level in a modular way. Starting from a specification of the software system 
as a P /T-net consisting of components interacting via synchronized transitions, 
we generate a network of communicating automata. For the analysis of this 
automata network, we apply three different concepts to manage the inherent 
complexity of state level analysis. First, we avoid the explicit generation of the 
reachability graph by representing it as a sum of Kronecker products of small 
automata matrices. Second, we avoid the explicit enumeration of the state space 
(reachability set) by representing it as a directed acyclic graph. As a third step, 
we integrate state level aggregation due to equivalence relations to reduce au- 
tomata state spaces a priori. Aggregation can be naturally integrated into the 
Kronecker description of the reachability graph and can therefore be performed 
in a very efficient way. All concepts together reduce drastically the memory 
requirements to represent large reachability sets and graphs. At least from the 
memory perspective, the state space explosion problem can be managed for most 
models which are described by synchronously communicating components. The 
situation is a little bit different if one considers the time requirements to build 
the data structures for really large systems with several hundred millions or 
some billions of states. As shown by the dining philosophers example it is some- 
times possible to generate data structures for such large models in a few seconds. 
However, if interactions between components become more complex such that 
a priori aggregation has only small effects on the size of the reachability set, 
then the handling of huge state spaces is still a very time consuming task, as the 
results for the second example indicate. 

The usability of any analysis technique relies on the availability of appropriate 
software tools incorporating the technique. The modular state level analysis is 
part of a general toolbox based on two standardized file formats to describe 
general classes of Petri-nets at a higher level and synchronized automata at 
a lower level. The toolbox includes, apart from modules for functional system 
analysis, also modules for performance or reliability analysis based on Markov 
chain technique. These techniques follow similar ideas as proposed here for the 
functional case. 

Future work will consider the exploitation of symmetries during the reacha- 
bility set generation and the compositional computation of equivalence relations. 
We plan to implement analysis steps in a distributed way on a workstation clus- 
ter and to further enhance the toolbox interconnection with other tools. 
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Abstract. The Edinburgh Concurrency Workbench has been the au- 
thor’s responsibility for the past four years, having been under develop- 
ment for eight years before that. Over its lifetime, we have learnt many 
lessons and encountered many questions about verification tool develop- 
ment, both from bitter experience and from discussion with other tool 
developers and users. This note raises some of them for wider discussion. 



1 Introduction 

It is common to hear it said that an important factor in the practical uptake of 
theoretical work in computer science is the availability of tools that incorporate 
the theory; and the spread of finite-automata-based verification tools through the 
US hardware verification industry is indeed one of the more widely visible signs 
of recent progress in theoretical computer science. Although there are now some 
cases of verification tools being taken over, or developed in house, by commercial 
organisations, it is more usual that they are developed in universities, at least 
partly by people whose jobs also involve research and teaching. 

The nature and range of software engineering problems encountered by devel- 
opers naturally vary with the kind of product being developed and with the na- 
ture of the developing organisation. This note draw on a longer paper “A Verifica- 
tion Tool Developer’s Vade Mecum” (available from www.dcs.ed.ac.uk/home/pxs) 
which attempted to bring out the special features of the development of verifica- 
tion tool development in universities. In this note we raise some of the questions, 
commenting on a few as space permits: the aim is to promote constructive shar- 
ing of experience and views. 

2 “Business case” level issues 

To begin with one of the earliest and thorniest of questions: 

Who does the development? And more difficult, who does the mainte- 
nance and support? 

* Perdita.Stevens@dcs.ed.ac.uk, supported by EPSRC GR/K68547. Tel: -1-44 131 650 
5195, Fax: +U 131 667 7209 
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~ Professional researchers who are expert in the underlying theory? 

— Students, with limited experience in both theory and software engineering? 
They may receive direction, but this normally comes from someone more 
expert in theory than engineering. 

— Professional software developers, employed for the purpose, under direction 
from researchers, who supply the detailed theoretical understanding? Will 
they have to be so minutely directed, in order to ensure that the theory is cor- 
rectly implemented, that the skilled work of system design is, in effect, again 
done by the researchers, with the programmers doing routine programming 
work only? 

— Professional software developers who are in the process of becoming profes- 
sional researchers? The possibility is probably unusual, but I was brought 
into the Edinburgh Concurrency Workbench project after some years as an 
industrial software engineer, but with only undergraduate level knowledge 
of the underlying theory, with the intention that I should acquire what- 
ever knowledge was necessary. My ignorance of the underlying theory posed 
formidable problems in the beginning - but perhaps it is easier in a uni- 
versity environment to acquire theoretical understanding than engineering 
understanding? 

How do you find the time to spend on tool work? In varying degrees, all 
of the options for tool development require a serious commitment of time and 
energy. For someone pursuing an academic career, there is a tension between 
producing papers and producing tools. Although increasingly universities seem 
to recognise the importance of producing tools (to gain the advantages cited 
below), it is very difficult - especially, but not only, for someone who is not 
engaged in tool development and maintenance - to appreciate the amount of 
time that is required. 

To some extent, it may be possible to combine the goals of producing papers 
and producing tools: there are fora, including TACAS, for presenting papers 
about new tools or major new features in tools. However, much of the effort 
required to maintain a tool, especially one which has many users, is the rou- 
tine (though skilled) work of updating interfaces to changing external systems, 
writing documentation, answering email, developing tests, etc. This work is not 
research. 

Do you want to develop a tool at all? We have begun to mention the 
disadvantages: it’s time-consuming, and difficult in ways which often have little 
to do with research, and it may be difficult to find the resource to do it well. 
Other options to consider may include: 

— Developing a component of another tool set, rather than a whole new tool. 
The practicality of this will depend on the intended functionality of the tool 
and the qualities of the tool set. 

— Getting someone else to develop the tool. If your intended user group is 
industrial verifiers, can you build the tool as a collaborative venture with an 
industrial partner? 
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Do you want your tool to have users? Or is it better to build a purely 
experimental system, with lower support needs? 

Who are the intended users of your tool? Students and their teachers? 
Researchers? Industrial verifiers? With what needs and experience? 

What do you want to achieve? For example: 

— Technology transfer: you may want to improve visibility of some well-established 
piece of theory (among industrial practitioners, students or both). 

— Theory experimentation: you may want to deepen your understanding of 
some theory by experimenting with different implementations. 

— Image manipulation: you may want yourself or your organisation to be seen 
as doing “practical” work. 

3 Architectural issues 

What high level structure is appropriate? (This is influenced, for example, 
by whether you want other people to be able to extend the tool, and if so, in 
what ways.) Specifically, 

Which decisions must be encapsulated so that they can easily be 
changed? 

What programming language is best for the purpose? Considerations will 
include, for example, support for encapsulation, the type system, the availability 
of compilers on the relevant platforms. 

What user interface is best? For example, do you want a graphical user 
interface or not, and if so, on what toolkit should it be based, considering main- 
tenance and portability? Much depends on who the users are. 



4 Issues concerning the development process and QA 

A quality assurance process suitable for academic development of verification 
tools needs to be extremely streamlined. A meta-question is 

What documentation of the process is useful? This depends on, for ex- 
ample, the group of people involved, their distribution and turnover. 

Let us consider a couple of important areas. 

Version control This is important for all systems, but particularly important for 
verification tools, where correctness is paramount. I find it helpful to version- 
control everything - code, build files, documentation, tests, “correct” answers to 
tests, etc. In order to make this feasible given resource constraints, the version 
control system has to be very easy to use, so that one can check something in 
and keep working on it without interrupting a chain of thought. 
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Testing It goes without saying that a verification tool needs to be thoroughly 
tested; but the effort required to do this is often underestimated. In mainstream 
software engineering, the usual estimate of how much of a software development 
project’s budget is spent on testing is 30 - 50%; verification tools can be expected 
to be towards the upper end of this spectrum. It is extremely tempting to cut 
corners here, and so it is crucial that all time that is spent on testing is used as 
effectively as possible. Some automated support for regression testing is probably 
essential - the CWB’s system testing software is written in Perl, a language 
which is well adapted to this kind of task. This simple program enables the 
CWB developer to run tests and spot newly introduced problems with minimal 
effort. It is simple-minded; for example semantically insignificant changes to 
CWB output - printing nominally unordered output items in a different order, 
for example - are reported as errors: but it has not yet seemed efficient to 
implement anything more sophisticated, bringing us on to the next question: 

What is it efficient to automate? It seems worth remarking that there is a 
danger of losing time by automating things, too. For example, after making two 
minor errors in releasing versions of the CWB, I developed a script to automate 
the release process, when in fact I would have been better off with a checklist 
of things to do when releasing the CWB: this would have solved the original 
problem more robustly with less effort. 

Other issues in the development process that may need to be considered 
include 

What coding practices are required? 

What kind of documentation is needed? Writing and maintaining docu- 
mentation is one of the most time-consuming aspects of tool development, so 
this needs particularly careful consideration. 

5 Dissemination issues 

are relevant if you are developing a tool which is to have external users. 

What kind of distribution policy is appropriate? 

What kind of support will you offer? 

In conclusion. 

What does your own experience suggest as answers to any of these 
questions? And what other questions are crucial? What is the single 
most important piece of advice to give to tool developers? 
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1 ETPs Goals 

The Electronic Tool Integration platform (ETI) associated to the Intern. Journal 
on Software Tools for Technology Transfer (STTT) [7] is designed for the inter- 
active experimentation with and coordination of heterogeneous tools^. ETI users 
are assisted by an advanced, personalized Online Service guiding experimenta- 
tion, coordination and simple browsing of the available tool repository according 
to their degree of experience. In particular, this allows even newcomers to orient 
themselves in the wealth of existing tools and to identify the most appropriate 
collection of tools to solve their own application-specific tasks. 

Typical users of the ETI platform are tool builders and members of software 
projects looking for adequate tool support in their project area, but also re- 
searchers and scientists interested in tools as a research aid. The effectiveness of 
the ETI approach depends on the richness of the ETI repositories, which steadily 
grow with the integration of new tools, transformations and benchmarks. 

A more detailed exposition, including background and related work can be 
found in [5]. 

2 The ETI Online Service 

ETI contains and manages a heterogeneous wealth of information, functionalities 
and data. From the ETI Service homepage, http://eti.cs.uni-dortmund.de, 
it is possible to 

1. access online information on the tools via hyperlinks to each tool’s home 
site. These may provide information (documentation, literature, user manu- 
als, prominent case studies) or, depending on the tool providers’ choice even 
the tool’s code (executable or source). 

2. access online a stand-alone version of each tool, centrally located at the 
ETI service sites and executing (running) there. 

3. access the ETI repository of integrated tools. The platform’s repository 
contains a collection of functionalities offered by each integrated tool, clas- 
sified for ease of retrieval according to behavioural and interfacing criteria. 

4. experiment at ease with the integrated tools, by 

^ The ETI platform is realized on top of the METAPrame environment [6,4]. 
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(a) running the (stand-alone or integrated) tools on libraries of examples, 
case studies, and benchmarks made available on the ETI platform, 

(b) testing and running single tool functionalities, capturing specific features 
offered by the integrated tools, on the same examples, from within a 
uniform graphical user interface provided by ETI, 

(c) constructing own application- specific heterogenous tools through combi- 
nation of single functionalities available in the ETI platform. This way 
users can prototypically solve problems which require the cooperation of 
several integrated tools, and experience the interplay of the integrated 
functionalities, 

(d) loosely specifying coodination tasks, which can be then automatically 
completed by means of ETI’s coordination support. This, in particular, 
takes care of any type (data format) incompatibilities, as detailed in [3]. 

5. experiment with own sets of data, to be deployed in user-specific, pro- 
tected home areas. 

The tool demonstration focusses on ETI’s unique support for high-level tool co- 
ordination, while illustrating the steadily improving features of personalization, 
statistic analysis and automatic evaluation. 

3 Experimentation by Loose Coordination 

Tool coordination is freed from any programming and technicalities, so that lit- 
tle or no specific knowledge is prerequisite to the use of ETI as a coordination 
environment. In particular, ETI provides high-level task specification languages, 
graphical support for specifications and user interaction, as well as automatic 
coordination support by means of automatic synthesis and prototype anima- 
tion [3] . This eases the access and use of the functionalities offered by different 
tools, implemented in different languages of different programming paradigms 
(functional, imperative, object-oriented) and running on different platforms. 

The perhaps most prominent application example for loose coordination is 
the type-based completion of type-incorrect tool combinations. In a heteroge- 
neous collection of tools, with all its advanced input and output formats, ex- 
act/correct tool coordination is extremely difficult. ETI provides convenient 
coordination interfaces even for newcomers, whose attention is kept free from 
typing constraints in order to concentrate on the desired functionalities. 

The formal backbone of loose coordination is model synthesis/construction 
for Linear Time Temporal Formulas [5]. Users of the ETI service need not know 
this logic. Rather, they may choose between several specification formats, like 
e.g. the above mentioned type-incomplete coordination sequences, graphical for- 
mats or (application-specifically) derived logics. All these specification formats 
can be handled automatically by our synthesis mechanism, which, in particu- 
lar, transforms type-incorrect coordination sequences into directly executable 
ones [3]. 

ETI can be operated without any previous knowledge about the content of 
the current tool repository. In fact, besides incompleteness (looseness) in the 
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above-described fashion (along a coordination execution), looseness in the speci- 
fication of single functionalities allows newcomers to specify tools just by select- 
ing desired properties. The system will then return the set of all corresponding 
(satisfying) functionalities, which can be investigated online using a hypertext 
documentation system. 

Figure 1 shows a screenshot summarizing the essential features and compo- 
nents offered by the ETI Online Service via the standard browsers in a platform 
independent fashion.^ A browser (left upper corner), serves as the documenta- 
tion facility (white frame), the console (upper frame), and the entry point for 
the other service functionalities (menu in the left frame). The core functionalities 
are invoked via the tool bar: 

— the type and activity taxonomy browsers, 

— the synthesis editor, 

— the graph editors. 

The screenshot additonally displays a synthesized coordination graph (upper 
right corner) and an example graph, reachable via the graph editor (lower right 
corner). 

Experts may use the coordination system in an even more flexible manner: they 
may use the full power of the SLTL linear time temporal logic [5] and request 
e.g. the presentation of the set of all (minimal) coordination sequences as feed- 
back. This way, they may investigate the full potential of the ETI repository by 
successively refining the logical specification. 

Thus people with different programming skills and professional profiles are 
able to profitably develop and test even complex tool coordination structures in 
a comfortable, intuitive manner. 



4 Conclusions and Perspectives 

The ETI Online Service plays a public service role, giving users the possibility of 
direct, hands-on, experience with a wealth of available tools and functionalities. 
This also includes features like the ETI Online Forum, where users may e.g., 
propose case studies, and report on their experiences [2]. The service is intended 
to develop into an independent tool presentation and evaluation site: potential 
customers (or project partners) are intended to use the service as a 

— directory for possible tools and algorithms satisfying totally or partially their 
needs, 

— (vendor- and producer-) independent test site for trying and comparing alter- 
native products and solutions, which may be accessed without the overload 
of getting demo copies, demo licenses, making own installations, etc., 

^ A typical user will not be confronted with all these windows at once, as they only 
appear on demand. 
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— quality assessment site for the published tools, which are refereed accord- 
ing to requirements like originality, usability, installability, stability, perfor- 
mance, design, etc., 

— independent benchmarking site for performance on a growing basis of prob- 
lems and case studies. 

This should simplify the communication between tool builders and tool users as 
well as between academia and industrial practice, supporting the transfer of tool- 
related technology. In fact, we are optimistic that the typical hesitation to try 
out new technologies can be overcome because serious hurdles, like installation 
of the tools, getting acquainted with new user interfaces, lack of direct compara- 
bility of the results and of performances, are eliminated. Moreover, the intended 
collaborative effort of the ETI user community to provide easily accessible in- 
formation about fair, application-specific evaluations of various competing tools 
on the basis of predefined benchmarks, will be of inestimable help for everybody 
in need of tool support. 
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